Performance LINQ: Group by byte

LINQ is powerful tool for collection (or enumeration) operating. You use functional-like syntax for filtering, grouping, transforming data. All LINQ methods work with common interface IEnumerable<T> and usually methods do not check real type of collection, or data type. Sometimes, if you know the type of collection or generic type, you can optimize code for better performance (like in Any vs Count).

This article will show, how you can speed up grouping by specific key - byte.

Let's look at LINQ GroupBy method implementation. It creates a Lookup - kind of hash-table with multiple values for one key. Like standart hash-table it operates with "buckets" (Lookup<TKey, TElement>.Grouping type). For every element from source collection it finds group by computing hash and comparing with key.

public class Lookup<TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>, IEnumerable, ILookup<TKey, TElement>
{
    private Lookup<TKey, TElement>.Grouping[] groupings;

    //...
    internal Lookup<TKey, TElement>.Grouping GetGrouping(TKey key, bool create)
    {   
        int hashCode = this.InternalGetHashCode(key);
        for (Lookup<TKey, TElement>.Grouping grouping = this.groupings[hashCode % this.groupings.Length]; grouping != null; grouping = grouping.hashNext)
        {
            if (grouping.hashCode == hashCode && this.comparer.Equals(grouping.key, key))
                return grouping;
        }
	
        //...		
    }
    //...		
}

Here is possibility for optimization with type byte. byte has only 256 different values, so grouping by byte value gives us only 256 different keys. We can replace "buckets" with one array of 256 elements - one per possible value of byte type.

internal sealed class ByteLookup<TElement> : ILookup<byte, TElement>
{
    private readonly Grouping[] groupings = new Grouping[byte.MaxValue + 1];
    
    //...		
    
    private Grouping GetGrouping(byte key)
    {
        return groupings[key] ?? (groupings[key] = new Grouping(key));
    }
}

You can find source code here

This gives us the following benefits:

  • Hash is equal to value - no collision
  • Simple key comparison

Benchmark

Full listing of benchmark tests:

[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
public class GroupByByteBenchmark
{
    [Params(10, 100, 1000, 10000)]
    public int count;

    private byte[] randomData;
    private byte[] zeroData;

    [GlobalSetup]
    public void Setup()
    {
        randomData = new byte[count];
        zeroData = new byte[count];
        new Random(42).NextBytes(randomData);
    }

    [Benchmark(Baseline = true)]
    [BenchmarkCategory("Random")]
    public IEnumerator<IGrouping<byte, byte>> GroupBy()
    {
        return randomData.GroupBy(o => o).GetEnumerator();
    }


    [Benchmark]
    [BenchmarkCategory("Random")]
    public IEnumerator<IGrouping<byte, byte>> GroupByByte()
    {
        return randomData.GroupByByte(o => o).GetEnumerator();
    }

    [Benchmark(Baseline = true)]
    [BenchmarkCategory("Zero")]
    public IEnumerator<IGrouping<byte, byte>> ZeroGroupBy()
    {
        return zeroData.GroupBy(o => o).GetEnumerator();
    }

    [Benchmark]
    [BenchmarkCategory("Zero")]
    public IEnumerator<IGrouping<byte, byte>> ZeroGroupByByte()
    {
        return zeroData.GroupByByte(o => o).GetEnumerator();
    }
}

And results:


BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i7-2670QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.106
  [Host]     : .NET Core 2.2.4 (CoreCLR 4.6.27521.02, CoreFX 4.6.27521.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.2.4 (CoreCLR 4.6.27521.02, CoreFX 4.6.27521.01), 64bit RyuJIT


Method count Mean Error StdDev Median Ratio RatioSD
GroupBy 10 825.7 ns 16.875 ns 45.042 ns 807.4 ns 1.00 0.00
GroupByByte 10 785.8 ns 15.187 ns 15.596 ns 781.3 ns 0.95 0.05
ZeroGroupBy 10 597.7 ns 11.645 ns 14.301 ns 592.0 ns 1.00 0.00
ZeroGroupByByte 10 546.8 ns 13.316 ns 14.801 ns 542.5 ns 0.92 0.04
GroupBy 100 9,525.3 ns 187.959 ns 275.507 ns 9,483.4 ns 1.00 0.00
GroupByByte 100 5,137.8 ns 31.933 ns 29.870 ns 5,125.5 ns 0.54 0.02
ZeroGroupBy 100 3,454.4 ns 22.817 ns 21.343 ns 3,444.2 ns 1.00 0.00
ZeroGroupByByte 100 2,306.3 ns 6.434 ns 5.704 ns 2,307.1 ns 0.67 0.00
GroupBy 1000 75,075.7 ns 1,653.028 ns 2,370.724 ns 74,447.1 ns 1.00 0.00
GroupByByte 1000 35,900.9 ns 318.114 ns 297.564 ns 35,785.5 ns 0.48 0.02
ZeroGroupBy 1000 30,918.4 ns 614.770 ns 1,349.436 ns 30,469.5 ns 1.00 0.00
ZeroGroupByByte 1000 18,865.2 ns 34.908 ns 29.150 ns 18,870.7 ns 0.61 0.03
GroupBy 10000 411,941.9 ns 2,670.768 ns 2,367.567 ns 410,684.0 ns 1.00 0.00
GroupByByte 10000 260,971.8 ns 2,433.431 ns 2,276.233 ns 259,975.7 ns 0.63 0.01
ZeroGroupBy 10000 297,577.1 ns 2,665.399 ns 2,362.807 ns 298,446.4 ns 1.00 0.00
ZeroGroupByByte 10000 182,999.1 ns 741.884 ns 619.507 ns 182,945.6 ns 0.62 0.01

For random data and array of zero values we get dome performance improvements up to 2 times. Also this optimization can be used for grouping by enum value with underlying byte type.

Links

blog comments powered by Disqus