This post started from discussion on SO: Why LINQ method Any does not check Count? This article shows comparasion of performance methods Any and Count = 0.
First of all, we create new extension method:
public static class EnumerableExtensions
{
public static bool CustomAny<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw new ArgumentNullException(nameof(source));
}
if (source is ICollection<TSource> collection)
{
return collection.Count != 0;
}
using (var enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
return true;
}
return false;
}
}
This method checks Count
property for ICollection<TSource>
type instead of creating enumerator. Let's check if it will be faster than Any
.
For benchmarking we use BenchmarkDotNet.
Collections to test:
List
Array
HashSet
Dictionary
Stack
Queue
SortedList
SortedSet
For testing we create collections with 1000 elements and compare methods Any
and CustomAny
:
[CategoriesColumn]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
public class SimpleCollections
{
[Params(1000)]
public int _size;
private List<int> _list;
private int[] _array;
private HashSet<int> _hashSet;
private Dictionary<int, int> _dictionary;
private Stack<int> _stack;
private Queue<int> _queue;
private SortedList<int, int> _sortedList;
private SortedSet<int> _sortedSet;
[GlobalSetup]
public void SetUp()
{
_list = new List<int>();
_array = new int[_size];
_hashSet = new HashSet<int>();
_dictionary = new Dictionary<int, int>();
_stack = new Stack<int>();
_queue = new Queue<int>();
_sortedList = new SortedList<int, int>();
_sortedSet = new SortedSet<int>();
for (int i = 0; i < _size; i++)
{
_list.Add(i);
_array[i] = i;
_hashSet.Add(i);
_dictionary[i] = i;
_stack.Push(i);
_queue.Enqueue(i);
_sortedList.Add(i, i);
_sortedSet.Add(i);
}
}
[Benchmark(Baseline = true, Description = "Any"), BenchmarkCategory("List")]
public bool ListAny()
{
return _list.Any();
}
[Benchmark(Description = "Custom"), BenchmarkCategory("List")]
public bool ListCount()
{
return _list.CustomAny();
}
///other collections
}
Benchmarks are grouped by the category - type of collection. Method Any
is default Enumerable.Any
, Custom
- our custom implementation EnumerableExtensions.CustomAny
.
Any
is a baseline benchmark in the category.
The first run result
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-2670QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
Frequency=2143565 Hz, Resolution=466.5126 ns, Timer=TSC
[Host] : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3110.0
DefaultJob : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.3110.0
Method | Categories | _size | Mean | Error | StdDev | Median | Scaled | ScaledSD |
---|---|---|---|---|---|---|---|---|
Any | List | 1000 | 38.88 ns | 0.8736 ns | 2.1430 ns | 38.32 ns | 1.00 | 0.00 |
Custom | List | 1000 | 16.49 ns | 0.1061 ns | 0.0941 ns | 16.50 ns | 0.43 | 0.02 |
Any | Array | 1000 | 27.46 ns | 0.2219 ns | 0.1732 ns | 27.39 ns | 1.00 | 0.00 |
Custom | Array | 1000 | 51.21 ns | 1.0657 ns | 1.3858 ns | 50.60 ns | 1.87 | 0.05 |
Any | HashSet | 1000 | 39.87 ns | 0.4526 ns | 0.4234 ns | 39.89 ns | 1.00 | 0.00 |
Custom | HashSet | 1000 | 15.06 ns | 0.1276 ns | 0.1193 ns | 15.01 ns | 0.38 | 0.00 |
Any | Dictinary | 1000 | 51.14 ns | 1.0330 ns | 1.4139 ns | 50.83 ns | 1.00 | 0.00 |
Custom | Dictinary | 1000 | 17.41 ns | 0.2392 ns | 0.2238 ns | 17.36 ns | 0.34 | 0.01 |
Any | Stack | 1000 | 42.10 ns | 0.8651 ns | 1.3722 ns | 42.29 ns | 1.00 | 0.00 |
Custom | Stack | 1000 | 49.41 ns | 1.0303 ns | 1.9602 ns | 48.60 ns | 1.17 | 0.06 |
Any | Queue | 1000 | 44.68 ns | 0.9345 ns | 1.6367 ns | 44.41 ns | 1.00 | 0.00 |
Custom | Queue | 1000 | 50.68 ns | 0.1254 ns | 0.0979 ns | 50.66 ns | 1.14 | 0.04 |
Any | SortedList | 1000 | 42.89 ns | 0.9397 ns | 1.8982 ns | 42.55 ns | 1.00 | 0.00 |
Custom | SortedList | 1000 | 17.48 ns | 0.3887 ns | 0.6600 ns | 17.17 ns | 0.41 | 0.02 |
Any | SortedSet | 1000 | 220.19 ns | 4.3996 ns | 4.3210 ns | 219.66 ns | 1.00 | 0.00 |
Custom | SortedSet | 1000 | 18.46 ns | 0.3891 ns | 0.3250 ns | 18.34 ns | 0.08 | 0.00 |
Here and later this legend is used:
Categories : Collection type
_size : Number of elements in collection
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Median : Value separating the higher half of all measurements (50th percentile)
Scaled : Mean(CurrentBenchmark) / Mean(BaselineBenchmark)
ScaledSD : Standard deviation of ratio of distribution of [CurrentBenchmark] and [BaselineBenchmark]
1 ns : 1 Nanosecond (0.000000001 sec)
As we see, in common the new method with Count
is faster than the old one or the same, except some collections. Lets examine them in details.
Array
Our new method gives much worse results. Why? Answer is not obvious. Array can be converted to ICollection<T>
, but it takes much time due to internal implementation of array in .NET.
So, we need to add array check:
public static class EnumerableExtensions
{
public static bool CustomAny<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw new ArgumentNullException(nameof(source));
}
if (source is TSource[] array)
{
return array.Length != 0;
}
if (source is ICollection<TSource> collection)
{
return collection.Count != 0;
}
using (var enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
return true;
}
return false;
}
}
Stack and Queue
These collections do not implement ICollection<T>
interface:
public class Stack<T> : IEnumerable<T>, IEnumerable, ICollection, IReadOnlyCollection<T>
public class Queue<T> : IEnumerable<T>, IEnumerable, ICollection, IReadOnlyCollection<T>
But they have ICollection
, we can use it as the most common:
public static class EnumerableExtensions
{
public static bool CustomAny<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw new ArgumentNullException(nameof(source));
}
if (source is TSource[] array)
{
return array.Length != 0;
}
if (source is ICollection<TSource> collection)
{
return collection.Count != 0;
}
if (source is ICollection baseCollection)
{
return baseCollection.Count != 0;
}
using (var enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
return true;
}
return false;
}
}
The second run result
Method | Categories | _size | Mean | Error | StdDev | Scaled | ScaledSD | ScaledSD |
---|---|---|---|---|---|---|---|---|
Any | List | 1000 | 37.52 ns | 0.6950 ns | 0.6501 ns | 1.00 | 0.00 | 0.00 |
Custom | List | 1000 | 19.69 ns | 0.2409 ns | 0.2254 ns | 0.53 | 0.01 | 0.02 |
Any | Array | 1000 | 27.11 ns | 0.5799 ns | 1.0004 ns | 1.00 | 0.00 | 0.00 |
Custom | Array | 1000 | 18.87 ns | 0.0513 ns | 0.0455 ns | 0.70 | 0.02 | 0.05 |
Any | HashSet | 1000 | 39.42 ns | 0.3633 ns | 0.3034 ns | 1.00 | 0.00 | 0.00 |
Custom | HashSet | 1000 | 18.76 ns | 0.0440 ns | 0.0367 ns | 0.48 | 0.00 | 0.00 |
Any | Dictinary | 1000 | 50.13 ns | 0.8848 ns | 0.8276 ns | 1.00 | 0.00 | 0.00 |
Custom | Dictinary | 1000 | 19.95 ns | 0.1613 ns | 0.1509 ns | 0.40 | 0.01 | 0.01 |
Any | Stack | 1000 | 39.56 ns | 0.5305 ns | 0.4702 ns | 1.00 | 0.00 | 0.00 |
Custom | Stack | 1000 | 31.66 ns | 0.1149 ns | 0.1075 ns | 0.80 | 0.01 | 0.06 |
Any | Queue | 1000 | 42.90 ns | 0.2227 ns | 0.2083 ns | 1.00 | 0.00 | 0.00 |
Custom | Queue | 1000 | 31.69 ns | 0.1502 ns | 0.1331 ns | 0.74 | 0.00 | 0.04 |
Any | SortedList | 1000 | 40.53 ns | 0.1220 ns | 0.1019 ns | 1.00 | 0.00 | 0.00 |
Custom | SortedList | 1000 | 19.71 ns | 0.0362 ns | 0.0339 ns | 0.49 | 0.00 | 0.02 |
Any | SortedSet | 1000 | 219.88 ns | 0.5931 ns | 0.4289 ns | 1.00 | 0.00 | 0.00 |
Custom | SortedSet | 1000 | 20.68 ns | 0.0420 ns | 0.0328 ns | 0.09 | 0.00 | 0.00 |
Ok, in all tests our method is better than default one. But there is one collection, where result is ~10 times better - SortedSet
.
SortedSet
Why method Any
dramatically worse than Count
? As we know, Any
just creates enumerator and tries to take the first element. Let's look at source code of SortedSet
:
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
return (IEnumerator<T>) new SortedSet<T>.Enumerator(this);
}
GetEnumerator
creates new struct of type Enumerator
, constructor gets SortedSet<T>
as a parameter:
internal Enumerator(SortedSet<T> set)
{
this.tree = set;
this.tree.VersionCheck();
this.version = this.tree.version;
this.stack = new Stack<SortedSet<T>.Node>(2 * SortedSet<T>.log2(set.Count + 1));
this.current = (SortedSet<T>.Node) null;
this.reverse = false;
this.siInfo = (SerializationInfo) null;
this.Intialize();
}
Open method Intialize
(misprint in source code):
private void Intialize()
{
this.current = (SortedSet<T>.Node) null;
SortedSet<T>.Node node1 = this.tree.root;
while (node1 != null)
{
SortedSet<T>.Node node2 = this.reverse ? node1.Right : node1.Left;
SortedSet<T>.Node node3 = this.reverse ? node1.Left : node1.Right;
if (this.tree.IsWithinRange(node1.Item))
{
this.stack.Push(node1);
node1 = node2;
}
else
node1 = node2 == null || !this.tree.IsWithinRange(node2.Item) ? node3 : node2;
}
}
It is worth noting that SortedSet
is built on the binary tree, and method Intialize
bypasses all nodes in tree and puts elements into the stack. So, when we create enumerator of SortedSet
, internal implementation creates new collection with all elements. The more elements we have, the more time it takes to copy:
Method | _size | Mean | Error | StdDev | Scaled |
---|---|---|---|---|---|
Any | 1000 | 230.64 ns | 4.5659 ns | 6.9727 ns | 1.00 |
Custom | 1000 | 22.23 ns | 0.4906 ns | 0.7036 ns | 0.10 |
Any | 10000 | 288.27 ns | 1.3822 ns | 1.0791 ns | 1.00 |
Custom | 10000 | 21.72 ns | 0.1599 ns | 0.1417 ns | 0.08 |
Any | 100000 | 343.54 ns | 6.7375 ns | 9.8757 ns | 1.00 |
Custom | 100000 | 21.70 ns | 0.3254 ns | 0.3043 ns | 0.06 |
Ok, we checked the most popular collections, and we can use Count
property instead of Enumerable.Any
.
But is it safe to use Count
every time? Is there a collection, where Count
takes more time than creating enumerator? You may be wondering.
Concurrent collections
All collections we checked before are not thread-safe, and can not be used in multi-threading applications. There is a special class of collections - concurrent collections:
- ConcurrentBag
- ConcurrentDictionary
- ConcurrentQueue
- ConcurrentStack
Let's fill collections from different threads to simulate multi-threading environment:
[GlobalSetup]
public void SetUp()
{
_bag = new ConcurrentBag<int>();
_dictionary = new ConcurrentDictionary<int, int>();
_queue = new ConcurrentQueue<int>();
_stack = new ConcurrentStack<int>();
var tasksCount = 10;
var batch = _size / tasksCount;
var tasks = new Task[tasksCount];
for (int i = 0; i < tasksCount; i++)
{
var task = i;
tasks[task] = Task.Run(() =>
{
var from = task * batch;
var to = (task + 1) * batch;
for (int j = from; j < to; j++)
{
_bag.Add(j);
_dictionary[j] = j;
_queue.Enqueue(j);
_stack.Push(j);
}
});
}
Task.WaitAll(tasks);
}
Concurrent collections results
Method | Categories | _size | Mean | Error | StdDev | Scaled | ScaledSD |
---|---|---|---|---|---|---|---|
Any | ConcurrentBag | 1000 | 7,065.75 ns | 37.9584 ns | 33.6491 ns | 1.00 | 0.00 |
Custom | ConcurrentBag | 1000 | 267.89 ns | 5.3096 ns | 4.7068 ns | 0.04 | 0.00 |
Any | ConcurrentDictionary | 1000 | 39.29 ns | 0.7618 ns | 0.5948 ns | 1.00 | 0.00 |
Custom | ConcurrentDictionary | 1000 | 6,319.42 ns | 27.8553 ns | 26.0558 ns | 160.88 | 2.45 |
Any | ConcurrentStack | 1000 | 28.08 ns | 0.1495 ns | 0.1399 ns | 1.00 | 0.00 |
Custom | ConcurrentStack | 1000 | 3,179.18 ns | 23.4161 ns | 21.9035 ns | 113.23 | 0.93 |
Any | ConcurrentQueue | 1000 | 72.12 ns | 1.4450 ns | 1.9779 ns | 1.00 | 0.00 |
Custom | ConcurrentQueue | 1000 | 48.28 ns | 0.3934 ns | 0.3487 ns | 0.67 | 0.02 |
We have got unpredictable results. Somewhere Any
is very slow, and vice versa.
ConcurrentBag
Method GetEnumerator
of ConcurrentBag
works like in SortedSet
. It locks collection and copies all elements into new list:
private List<T> ToList()
{
List<T> objList = new List<T>();
for (ConcurrentBag<T>.ThreadLocalList threadLocalList = this.m_headList; threadLocalList != null; threadLocalList = threadLocalList.m_nextList)
{
for (ConcurrentBag<T>.Node node = threadLocalList.m_head; node != null; node = node.m_next)
objList.Add(node.m_value);
}
return objList;
}
Count
property locks collection and just sum up all internal lists counts:
private int GetCountInternal()
{
int num = 0;
for (ConcurrentBag<T>.ThreadLocalList threadLocalList = this.m_headList; threadLocalList != null; threadLocalList = threadLocalList.m_nextList)
checked { num += threadLocalList.Count; }
return num;
}
So, Count
is much faster than Any
.
ConcurrentDictionary
GetEnumerator
method does not create new collection and uses yield
to iterate through collection, so, Any
just do one iteration:
public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
{
foreach (ConcurrentDictionary<TKey, TValue>.Node bucket in this.m_tables.m_buckets)
{
ConcurrentDictionary<TKey, TValue>.Node current;
for (current = Volatile.Read<ConcurrentDictionary<TKey, TValue>.Node>(ref bucket); current != null; current = current.m_next)
yield return new KeyValuePair<TKey, TValue>(current.m_key, current.m_value);
current = (ConcurrentDictionary<TKey, TValue>.Node) null;
}
}
Count
locks collection and sums up counts of internal tables:
public int Count
{
[__DynamicallyInvokable] get
{
int locksAcquired = 0;
try
{
this.AcquireAllLocks(ref locksAcquired);
return this.GetCountInternal();
}
finally
{
this.ReleaseLocks(0, locksAcquired);
}
}
}
private int GetCountInternal()
{
int num = 0;
for (int index = 0; index < this.m_tables.m_countPerLock.Length; ++index)
num += this.m_tables.m_countPerLock[index];
return num;
}
And the more elements we have the more time it takes:
Method | _size | Mean | Error | StdDev | Scaled | ScaledSD |
---|---|---|---|---|---|---|
Any | 10 | 37.79 ns | 0.7850 ns | 1.0207 ns | 1.00 | 0.00 |
Custom | 10 | 239.63 ns | 2.2313 ns | 2.0872 ns | 6.34 | 0.18 |
Any | 100 | 34.46 ns | 0.4621 ns | 0.4322 ns | 1.00 | 0.00 |
Custom | 100 | 823.62 ns | 16.0617 ns | 17.8525 ns | 23.90 | 0.58 |
Any | 1000 | 38.22 ns | 0.5311 ns | 0.4968 ns | 1.00 | 0.00 |
Custom | 1000 | 6,264.17 ns | 15.1371 ns | 12.6402 ns | 163.92 | 2.09 |
Any | 10000 | 35.57 ns | 0.5368 ns | 0.5021 ns | 1.00 | 0.00 |
Custom | 10000 | 24,819.31 ns | 283.1502 ns | 264.8588 ns | 697.82 | 11.97 |
Profiler shows that locking takes a lot of time:
ConcurrentStack
GetEnumerator
uses lazy enumeration and it's cheap to use:
private IEnumerator<T> GetEnumerator(ConcurrentStack<T>.Node head)
{
for (ConcurrentStack<T>.Node current = head; current != null; current = current.m_next)
yield return current.m_value;
}
But Count
just counts all elements even without locking:
public int Count
{
[__DynamicallyInvokable] get
{
int num = 0;
for (ConcurrentStack<T>.Node node = this.m_head; node != null; node = node.m_next)
++num;
return num;
}
}
ConcurrentQueue
GetEnumerator
implementation is a bit complicated, but it uses lazy yield internal. Count
uses simple calculations without enumerating all elements. We can use both methods.
As we see, concurrent collections are complicated and internal implementation differs very much.
Universal method for everything?
No, it's impossible. We should use Count
or Any
methods depending on collection.