An array - the simplest collection. It's just a memory block with elements of one type. Arrays are often used when we need to iterate a collection, to search elements or transform them. C# and .net evolved, many new collections were introduced, interfaces allowed to create generalized algorithms, LINQ simplified the work of programmers, but we continue to use arrays in inappropriate places and perform excess and sometimes harmful transformations. This post begins a series about collections, their features, performance and design. All posts will contain benchmarks, code examples, my opinion and recommendations.
I want to start with a string
, some specific collection of .net.
- Part 1: string to array of chars
- Part 2: automatic diagnostics
- Part 3: collections
String
Any large enough project operates with strings. Usually strings store a user input and are used as data for other types. For example, we can read text input and convert it to a number type:
static void Main()
{
var input = Console.ReadLine();
var value = int.Parse(input);
}
All we know, user input must be validated. Here we need to check that all chars are digits. It's a very common check. And often the following code is used:
private bool IsDigitString(string str)
{
return str.ToCharArray().All(char.IsDigit);
}
or
string bik = ...;
if (bik.ToArray().Any(x => !char.IsDigit(x)))
{
message = "BIK must contain only digits";
return false;
}
Looks simple: get a collection of chars and call LINQ-method to check every digit. The only unnecessary and harmful thing is transformation of string to array of chars with methods ToCharArray()
or ToArray()
. We used to use LINQ-methods with collections, but forget that the string is an immutable collection that implements IEnumerable<char>
interface:
[Serializable]
public sealed partial class String : IComparable, IEnumerable, IConvertible, IEnumerable<char>, IComparable<string>, IEquatable<string>, ICloneable
{
...
}
So, we can modify our examples and remove excessive calls:
private bool IsDigitString(string str)
{
return str.All(char.IsDigit);
}
and
string bik = ...;
if (bik.Any(x => !char.IsDigit(x)))
{
message = "BIK must contain only digits";
return false;
}
Ok, our code became simpler and more readable. But what do we get besides code quality benefits? Benchmarks can help us. Let's compare three methods: with ToArray()
, with ToCharArray()
and without additional calls for different string lengths and frameworks:
Benchmark code. Click to expand.
namespace ToCharArrayBenchmark
{
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
[SimpleJob(RuntimeMoniker.Net48, baseline: true)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
[SimpleJob(RuntimeMoniker.Net50)]
[DisassemblyDiagnoser]
[MemoryDiagnoser]
public class IsDigitBenchmark
{
[Params(10, 100, 1000)]
public int N { get; set; }
private string value;
[GlobalSetup]
public void SetUp()
{
var chars = new char[N];
for (int i = 0; i < N; i++)
{
chars[i] = (char)(i % 10 + '0');
}
value = new string(chars);
}
[Benchmark]
public bool IsDigitString_ToCharArray()
{
return value.ToCharArray().All(char.IsDigit);
}
[Benchmark]
public bool IsDigitString_ToArray()
{
return value.ToArray().All(char.IsDigit);
}
[Benchmark(Baseline = true)]
public bool IsDigitString()
{
return value.All(char.IsDigit);
}
}
}
Full results in table format. Click to expand.
BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1237 (21H1/May2021Update)
AMD Ryzen 7 4700U with Radeon Graphics, 1 CPU, 8 logical and 8 physical cores
[Host] : .NET Framework 4.8 (4.8.4400.0), X64 RyuJIT
.NET 5.0 : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
.NET Core 3.1 : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
.NET Framework 4.8 : .NET Framework 4.8 (4.8.4400.0), X64 RyuJIT
Method | Runtime | N | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|---|---|
IsDigitString | .NET 5.0 | 10 | 94.54 ns | 1.924 ns | 3.267 ns | 1.01 | 0.04 | 0.0459 | 96 B |
IsDigitString_ToCharArray | .NET 5.0 | 10 | 97.01 ns | 0.602 ns | 0.503 ns | 1.02 | 0.02 | 0.0688 | 144 B |
IsDigitString_ToArray | .NET 5.0 | 10 | 250.71 ns | 3.346 ns | 2.794 ns | 2.64 | 0.04 | 0.1376 | 288 B |
IsDigitString | .NET Core 3.1 | 10 | 98.74 ns | 1.595 ns | 1.492 ns | 1.04 | 0.02 | 0.0459 | 96 B |
IsDigitString_ToCharArray | .NET Core 3.1 | 10 | 105.02 ns | 1.217 ns | 1.079 ns | 1.10 | 0.02 | 0.0688 | 144 B |
IsDigitString_ToArray | .NET Core 3.1 | 10 | 294.50 ns | 0.838 ns | 0.700 ns | 3.11 | 0.05 | 0.1373 | 288 B |
IsDigitString | .NET Framework 4.8 | 10 | 94.96 ns | 1.863 ns | 1.913 ns | 1.00 | 0.00 | 0.0459 | 96 B |
IsDigitString_ToCharArray | .NET Framework 4.8 | 10 | 90.48 ns | 1.345 ns | 1.258 ns | 0.95 | 0.02 | 0.0688 | 144 B |
IsDigitString_ToArray | .NET Framework 4.8 | 10 | 249.17 ns | 0.617 ns | 0.482 ns | 2.63 | 0.04 | 0.1452 | 305 B |
IsDigitString | .NET 5.0 | 100 | 617.01 ns | 2.901 ns | 2.714 ns | 0.91 | 0.02 | 0.0458 | 96 B |
IsDigitString_ToCharArray | .NET 5.0 | 100 | 730.71 ns | 6.392 ns | 5.666 ns | 1.08 | 0.02 | 0.1526 | 320 B |
IsDigitString_ToArray | .NET 5.0 | 100 | 1,471.00 ns | 25.149 ns | 62.630 ns | 2.21 | 0.16 | 0.3891 | 816 B |
IsDigitString | .NET Core 3.1 | 100 | 695.67 ns | 13.740 ns | 22.576 ns | 1.03 | 0.03 | 0.0458 | 96 B |
IsDigitString_ToCharArray | .NET Core 3.1 | 100 | 707.81 ns | 14.127 ns | 40.760 ns | 1.12 | 0.05 | 0.1526 | 320 B |
IsDigitString_ToArray | .NET Core 3.1 | 100 | 1,638.57 ns | 18.990 ns | 16.834 ns | 2.42 | 0.05 | 0.3891 | 816 B |
IsDigitString | .NET Framework 4.8 | 100 | 677.60 ns | 13.072 ns | 12.839 ns | 1.00 | 0.00 | 0.0458 | 96 B |
IsDigitString_ToCharArray | .NET Framework 4.8 | 100 | 692.45 ns | 10.962 ns | 9.717 ns | 1.02 | 0.02 | 0.1526 | 321 B |
IsDigitString_ToArray | .NET Framework 4.8 | 100 | 1,696.70 ns | 26.955 ns | 22.509 ns | 2.51 | 0.05 | 0.4768 | 1,003 B |
IsDigitString | .NET 5.0 | 1000 | 8,340.66 ns | 159.513 ns | 141.404 ns | 1.26 | 0.02 | 0.0458 | 96 B |
IsDigitString_ToCharArray | .NET 5.0 | 1000 | 7,902.88 ns | 153.280 ns | 209.812 ns | 1.20 | 0.04 | 1.0071 | 2,120 B |
IsDigitString_ToArray | .NET 5.0 | 1000 | 12,526.66 ns | 215.331 ns | 201.421 ns | 1.90 | 0.03 | 2.1820 | 4,568 B |
IsDigitString | .NET Core 3.1 | 1000 | 6,771.54 ns | 42.026 ns | 35.093 ns | 1.03 | 0.01 | 0.0458 | 96 B |
IsDigitString_ToCharArray | .NET Core 3.1 | 1000 | 6,820.17 ns | 77.269 ns | 60.326 ns | 1.03 | 0.01 | 1.0071 | 2,120 B |
IsDigitString_ToArray | .NET Core 3.1 | 1000 | 14,993.39 ns | 249.113 ns | 255.821 ns | 2.27 | 0.04 | 2.1820 | 4,568 B |
IsDigitString | .NET Framework 4.8 | 1000 | 6,609.99 ns | 51.380 ns | 48.061 ns | 1.00 | 0.00 | 0.0458 | 96 B |
IsDigitString_ToCharArray | .NET Framework 4.8 | 1000 | 6,922.85 ns | 102.382 ns | 90.759 ns | 1.05 | 0.01 | 1.0071 | 2,127 B |
IsDigitString_ToArray | .NET Framework 4.8 | 1000 | 15,401.52 ns | 117.030 ns | 103.744 ns | 2.33 | 0.02 | 3.0823 | 6,475 B |
// * Legends *
N : Value of the 'N' parameter
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
RatioSD : Standard deviation of the ratio distribution ([Current]/[Baseline])
Gen 0 : GC Generation 0 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 us : 1 Microsecond (0.000001 sec)
Let's examine a diagram with execution times (in microseconds) for strings with 100 chars:
As we see, method IsDigitString
is about 5% faster than calling ToCharArray()
. ToArray()
is up to 3 times slower.
Both ToCharArray()
and ToArray()
do the same, but ToCharArray()
is a string
class method and very optimized with unsafe code. ToArray()
is an extension method for IEnumerable<T>
and is not optimized so much.
So, we know, there is no need to transform string to array to use LINQ-methods, this transformation slightly hurts performance and there is one more thing - GC. Every time when we call ToArray()
or ToCharArray()
we create at least one new object - a new array. We no longer need the source string and GC will collect at some time.
The next diagram shows us memory allocations (in bytes) for strings with 100 chars:
Cons
Should we always always remove ToArray()
and ToCharArray()
calls? It depends on context. ToArray()
generates a lot of overhead, so I can't find a situation where to use it. But as we saw before, ToCharArray()
is very fast, and almost does not affect performance. Can you imagine that calling ToCharArray()
will save time? But it does! Look at this example:
var convertParameterSymbols = attributes["ConvertParameterSymbols"];
if (convertParameterSymbols != null)
ConvertParameterSymbols = new List<char>(convertParameterSymbols.ToCharArray());
It looks like we need to remove ToCharrArray()
. But before doing this, we need to go back to the string
class definition. As we remember, it implements IEnumerable<char>
, but not ICollection<char>
.
ICollection<T>
interface describes the contract for mutable collections (Add
, Clear
, Remove
) that is not suitable for strings.
And look at List<T>
constructor:
public List(IEnumerable<T> collection)
{
if (collection == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
if (collection is ICollection<T> objs)
{
int count = objs.Count;
if (count == 0)
{
this._items = List<T>._emptyArray;
}
else
{
this._items = new T[count];
objs.CopyTo(this._items, 0);
this._size = count;
}
}
else
{
this._size = 0;
this._items = List<T>._emptyArray;
foreach (T obj in collection)
this.Add(obj);
}
}
It's optimized for ICollection<T>
objects. Elements are just copied as a block with CopyTo
instead of adding one by one that causes resizes of list. What types do implement ICollection<T>
interface? You will be surprised, but any array (T[]
) can be converted to ICollection<T>
. And using new List<char>(convertParameterSymbols.ToCharArray())
is faster than new List<char>(convertParameterSymbols)
. Benchmarks can prove this.
Benchmark code. Click to expand.
namespace ToCharArrayBenchmark
{
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
[SimpleJob(RuntimeMoniker.Net48, baseline: true)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
[SimpleJob(RuntimeMoniker.Net50)]
[DisassemblyDiagnoser]
[MemoryDiagnoser]
public class ConstructorBenchmark
{
[Params(10, 100, 1000)]
public int N { get; set; }
private string value;
[GlobalSetup]
public void SetUp()
{
var chars = new char[N];
for (int i = 0; i < N; i++)
{
chars[i] = (char)(i % 10 + '0');
}
value = new string(chars);
}
[Benchmark(Baseline = true)]
public List<char> Constructor()
{
return new List<char>(value);
}
[Benchmark]
public List<char> Constructor_ToCharArray()
{
return new List<char>(value.ToCharArray());
}
[Benchmark]
public List<char> Constructor_ToArray()
{
return new List<char>(value.ToArray());
}
}
}
Full results in table format. Click to expand.
BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1288 (21H1/May2021Update)
AMD Ryzen 7 4700U with Radeon Graphics, 1 CPU, 8 logical and 8 physical cores
[Host] : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT
.NET 5.0 : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
.NET Core 3.1 : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
.NET Framework 4.8 : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT
Method | Runtime | N | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Allocated |
---|---|---|---|---|---|---|---|---|---|
Constructor | .NET 5.0 | 10 | 115.76 ns | 1.517 ns | 1.345 ns | 0.74 | 0.01 | 0.0917 | 192 B |
Constructor_ToCharArray | .NET 5.0 | 10 | 35.95 ns | 0.698 ns | 0.653 ns | 0.23 | 0.00 | 0.0612 | 128 B |
Constructor_ToArray | .NET 5.0 | 10 | 194.70 ns | 2.219 ns | 1.967 ns | 1.25 | 0.01 | 0.1299 | 272 B |
Constructor | .NET Core 3.1 | 10 | 138.96 ns | 2.804 ns | 2.880 ns | 0.89 | 0.02 | 0.0918 | 192 B |
Constructor_ToCharArray | .NET Core 3.1 | 10 | 52.31 ns | 0.897 ns | 0.839 ns | 0.34 | 0.01 | 0.0612 | 128 B |
Constructor_ToArray | .NET Core 3.1 | 10 | 228.65 ns | 0.703 ns | 0.657 ns | 1.47 | 0.00 | 0.1299 | 272 B |
Constructor | .NET Framework 4.8 | 10 | 155.57 ns | 0.166 ns | 0.147 ns | 1.00 | 0.00 | 0.0956 | 201 B |
Constructor_ToCharArray | .NET Framework 4.8 | 10 | 71.37 ns | 1.098 ns | 1.027 ns | 0.46 | 0.01 | 0.0650 | 136 B |
Constructor_ToArray | .NET Framework 4.8 | 10 | 232.85 ns | 1.850 ns | 1.730 ns | 1.50 | 0.01 | 0.1414 | 297 B |
Constructor | .NET 5.0 | 100 | 648.65 ns | 7.217 ns | 6.751 ns | 0.80 | 0.02 | 0.3405 | 712 B |
Constructor_ToCharArray | .NET 5.0 | 100 | 51.29 ns | 0.921 ns | 0.817 ns | 0.06 | 0.00 | 0.2295 | 480 B |
Constructor_ToArray | .NET 5.0 | 100 | 791.75 ns | 15.823 ns | 20.012 ns | 0.99 | 0.03 | 0.4663 | 976 B |
Constructor | .NET Core 3.1 | 100 | 684.45 ns | 4.868 ns | 4.065 ns | 0.85 | 0.02 | 0.3405 | 712 B |
Constructor_ToCharArray | .NET Core 3.1 | 100 | 70.10 ns | 0.527 ns | 0.493 ns | 0.09 | 0.00 | 0.2294 | 480 B |
Constructor_ToArray | .NET Core 3.1 | 100 | 962.36 ns | 4.630 ns | 3.867 ns | 1.19 | 0.03 | 0.4654 | 976 B |
Constructor | .NET Framework 4.8 | 100 | 800.13 ns | 14.134 ns | 17.875 ns | 1.00 | 0.00 | 0.3443 | 722 B |
Constructor_ToCharArray | .NET Framework 4.8 | 100 | 95.32 ns | 1.880 ns | 1.931 ns | 0.12 | 0.00 | 0.2333 | 489 B |
Constructor_ToArray | .NET Framework 4.8 | 100 | 960.90 ns | 6.956 ns | 5.431 ns | 1.19 | 0.03 | 0.5569 | 1,171 B |
Constructor | .NET 5.0 | 1000 | 5,289.46 ns | 104.687 ns | 153.449 ns | 0.83 | 0.03 | 2.0828 | 4,368 B |
Constructor_ToCharArray | .NET 5.0 | 1000 | 235.86 ns | 4.589 ns | 4.507 ns | 0.04 | 0.00 | 1.9505 | 4,080 B |
Constructor_ToArray | .NET 5.0 | 1000 | 5,943.57 ns | 111.670 ns | 204.195 ns | 0.95 | 0.04 | 3.1128 | 6,528 B |
Constructor | .NET Core 3.1 | 1000 | 5,196.81 ns | 70.120 ns | 65.590 ns | 0.82 | 0.02 | 2.0828 | 4,368 B |
Constructor_ToCharArray | .NET Core 3.1 | 1000 | 247.25 ns | 1.114 ns | 0.987 ns | 0.04 | 0.00 | 1.9503 | 4,080 B |
Constructor_ToArray | .NET Core 3.1 | 1000 | 7,100.14 ns | 85.990 ns | 80.435 ns | 1.13 | 0.01 | 3.1128 | 6,528 B |
Constructor | .NET Framework 4.8 | 1000 | 6,306.12 ns | 69.763 ns | 65.257 ns | 1.00 | 0.00 | 2.0905 | 4,389 B |
Constructor_ToCharArray | .NET Framework 4.8 | 1000 | 275.66 ns | 2.318 ns | 2.055 ns | 0.04 | 0.00 | 1.9541 | 4,101 B |
Constructor_ToArray | .NET Framework 4.8 | 1000 | 7,507.12 ns | 20.504 ns | 17.121 ns | 1.19 | 0.01 | 4.0131 | 8,453 B |
// * Legends *
N : Value of the 'N' parameter
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
RatioSD : Standard deviation of the ratio distribution ([Current]/[Baseline])
Gen 0 : GC Generation 0 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 us : 1 Microsecond (0.000001 sec)
Time diagram (in microseconds) for strings with 100 chars:
Incredible! A list created from an array is about 10 times faster than created from a source string. And memory usage is more optimal (in bytes):
But we should remember, it's a very specific case.
Conclusion
In common we should be be more attentive and avoid unnecessary transformations for reasons:
- Performance. Such small optimizations can positively influence the processing of big data.
- Additional memory traffic - GC has to collect more data, GC pauses can be longer.
- Code quality. We can reduce code but not lose functionality.
Some cases require special attention and a deeper understanding of the .net platform.
If you really need to convert a string to an array use ToCharArray()
instead of ToArray()
.
More examples
Here are more examples I found.
Lowercase string
There is a very popular way to lowercase the first letter of a string. We remember - strings are immutable and name[0] = char.ToLowerInvariant(name[0])
won't compile. So, we need to create a new string from a char array.
public override string Format(string name)
{
if (string.IsNullOrEmpty(name) || char.IsLower(name[0])) return name;
var chars = name.ToCharArray();
chars[0] = char.ToLowerInvariant(chars[0]);
return new string(chars);
}
Reverse string
There is no built-in method to reverse a string, so we have to create our own using Array.Reverse
method, so we transform a source string to an array.
public static string ReverseString(this string input)
{
var charArray = input.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
Iterate string
public static byte[] ToBytes(string value)
{
char[] chars = value.ToCharArray();
byte[] res = new byte[chars.Length / 2];
for (int i = 0, j = 0; i < chars.Length; i += 2, j++)
res[j] = (byte)(FromHex((byte)chars[i]) << 4 ^ FromHex((byte)chars[i + 1]));
return res;
}
We iterate through a string, and no need to create a new array. The code can be simplified:
public static byte[] ToBytes(string value)
{
byte[] res = new byte[value.Length / 2];
for (int i = 0, j = 0; i < value.Length; i += 2, j++)
res[j] = (byte)(FromHex((byte)value[i]) << 4 ^ FromHex((byte)value[i + 1]));
return res;
}