LINQ is powerful tool for collection (or enumeration) operating. You use functional-like syntax for filtering, grouping, transforming data. All LINQ methods work with common interface IEnumerable<T>
and usually methods do not check real type of collection, or data type. Sometimes, if you know the type of collection or generic type, you can optimize code for better performance (like in Any vs Count).
This article will show, how you can speed up grouping by specific key - byte
.
Let's look at LINQ GroupBy
method implementation. It creates a Lookup
- kind of hash-table with multiple values for one key. Like standard hash-table it operates with "buckets" (Lookup<TKey, TElement>.Grouping
type). For every element from source collection it finds group by computing hash and comparing with key.
public class Lookup<TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>, IEnumerable, ILookup<TKey, TElement>
{
private Lookup<TKey, TElement>.Grouping[] groupings;
//...
internal Lookup<TKey, TElement>.Grouping GetGrouping(TKey key, bool create)
{
int hashCode = this.InternalGetHashCode(key);
for (Lookup<TKey, TElement>.Grouping grouping = this.groupings[hashCode % this.groupings.Length]; grouping != null; grouping = grouping.hashNext)
{
if (grouping.hashCode == hashCode && this.comparer.Equals(grouping.key, key))
return grouping;
}
//...
}
//...
}
Here is one possibility for optimization with type byte
. byte
has only 256 different values, so grouping by byte
value gives us only 256 different keys. We can replace "buckets" with one array of 256 elements - one per possible value of byte
type.
internal sealed class ByteLookup<TElement> : ILookup<byte, TElement>
{
private readonly Grouping[] groupings = new Grouping[byte.MaxValue + 1];
//...
private Grouping GetGrouping(byte key)
{
return groupings[key] ?? (groupings[key] = new Grouping(key));
}
}
This gives us the following benefits:
- Hash is equal to value - no collision
- Simple key comparison
Benchmark
Full listing of benchmark tests:
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
public class GroupByByteBenchmark
{
[Params(10, 100, 1000, 10000)]
public int count;
private byte[] randomData;
private byte[] zeroData;
[GlobalSetup]
public void Setup()
{
randomData = new byte[count];
zeroData = new byte[count];
new Random(42).NextBytes(randomData);
}
[Benchmark(Baseline = true)]
[BenchmarkCategory("Random")]
public IEnumerator<IGrouping<byte, byte>> GroupBy()
{
return randomData.GroupBy(o => o).GetEnumerator();
}
[Benchmark]
[BenchmarkCategory("Random")]
public IEnumerator<IGrouping<byte, byte>> GroupByByte()
{
return randomData.GroupByByte(o => o).GetEnumerator();
}
[Benchmark(Baseline = true)]
[BenchmarkCategory("Zero")]
public IEnumerator<IGrouping<byte, byte>> ZeroGroupBy()
{
return zeroData.GroupBy(o => o).GetEnumerator();
}
[Benchmark]
[BenchmarkCategory("Zero")]
public IEnumerator<IGrouping<byte, byte>> ZeroGroupByByte()
{
return zeroData.GroupByByte(o => o).GetEnumerator();
}
}
And results:
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i7-2670QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.106
[Host] : .NET Core 2.2.4 (CoreCLR 4.6.27521.02, CoreFX 4.6.27521.01), 64bit RyuJIT
DefaultJob : .NET Core 2.2.4 (CoreCLR 4.6.27521.02, CoreFX 4.6.27521.01), 64bit RyuJIT
Method | count | Mean | Error | StdDev | Median | Ratio | RatioSD |
---|---|---|---|---|---|---|---|
GroupBy | 10 | 825.7 ns | 16.875 ns | 45.042 ns | 807.4 ns | 1.00 | 0.00 |
GroupByByte | 10 | 785.8 ns | 15.187 ns | 15.596 ns | 781.3 ns | 0.95 | 0.05 |
ZeroGroupBy | 10 | 597.7 ns | 11.645 ns | 14.301 ns | 592.0 ns | 1.00 | 0.00 |
ZeroGroupByByte | 10 | 546.8 ns | 13.316 ns | 14.801 ns | 542.5 ns | 0.92 | 0.04 |
GroupBy | 100 | 9,525.3 ns | 187.959 ns | 275.507 ns | 9,483.4 ns | 1.00 | 0.00 |
GroupByByte | 100 | 5,137.8 ns | 31.933 ns | 29.870 ns | 5,125.5 ns | 0.54 | 0.02 |
ZeroGroupBy | 100 | 3,454.4 ns | 22.817 ns | 21.343 ns | 3,444.2 ns | 1.00 | 0.00 |
ZeroGroupByByte | 100 | 2,306.3 ns | 6.434 ns | 5.704 ns | 2,307.1 ns | 0.67 | 0.00 |
GroupBy | 1000 | 75,075.7 ns | 1,653.028 ns | 2,370.724 ns | 74,447.1 ns | 1.00 | 0.00 |
GroupByByte | 1000 | 35,900.9 ns | 318.114 ns | 297.564 ns | 35,785.5 ns | 0.48 | 0.02 |
ZeroGroupBy | 1000 | 30,918.4 ns | 614.770 ns | 1,349.436 ns | 30,469.5 ns | 1.00 | 0.00 |
ZeroGroupByByte | 1000 | 18,865.2 ns | 34.908 ns | 29.150 ns | 18,870.7 ns | 0.61 | 0.03 |
GroupBy | 10000 | 411,941.9 ns | 2,670.768 ns | 2,367.567 ns | 410,684.0 ns | 1.00 | 0.00 |
GroupByByte | 10000 | 260,971.8 ns | 2,433.431 ns | 2,276.233 ns | 259,975.7 ns | 0.63 | 0.01 |
ZeroGroupBy | 10000 | 297,577.1 ns | 2,665.399 ns | 2,362.807 ns | 298,446.4 ns | 1.00 | 0.00 |
ZeroGroupByByte | 10000 | 182,999.1 ns | 741.884 ns | 619.507 ns | 182,945.6 ns | 0.62 | 0.01 |
For random data and array of zero values we get dome performance improvements up to 2 times. Also this optimization can be used for grouping by enum value with underlying byte
type.