A popular task is to check if a string contains only digits. For example, you need to check if the user entered the correct phone number, index or a tax code of an organization. There are several ways to solve this task, which differ in their efficiency. Let's take a look at the most popular ones.
Regex
Probably the most popular way to solve this task is to use a regular expression.
It's simple and easy to use an expression ^[0-9]*$
(or ^\d*$
).
Below is a naive implementation of a regular expression check:
Regex regex = new Regex("^[0-9]*$");
var value = "123456789000";
var isValid = regex.IsMatch(value);
I'm sure someone sees the problem. This implementation is only suitable for a one-time launch.
In production code, when you check thousands of strings, this solution will not be efficient.
.NET provides the ability to compile a regular expression at runtime, by using the RegexOptions.Compiled
option:
Regex regex = new Regex("^[0-9]*$", RegexOptions.Compiled);
var value = "123456789000";
var isValid = regex.IsMatch(value);
Calling a constructor with this option will generate IL code that will be called through DynamicMethod
inside
Regex.IsMatch
, which will be faster than the usual regular expression processing.
The downside is that it takes longer to create a Regex
object due to the time spent on compilation at runtime, but
this quickly pays off with repeated use.
Let's compare both solutions.
Benchmark code. Click to expand.
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net90)]
[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
[SimpleJob(RuntimeMoniker.Net48)]
[HideColumns("Job", "Error", "StdDev", "Gen0")]
public partial class DigitsBenchmarks
{
private static string value = "123456789000";
private static readonly Regex regex = new Regex("^[0-9]*$");
private static readonly Regex compiledRegex = new Regex("^[0-9]*$", RegexOptions.Compiled);
[Benchmark]
public bool Regex()
{
return regex.IsMatch(value);
}
[Benchmark]
public bool CompiledRegex()
{
return compiledRegex.IsMatch(value);
}
}
Results:
Method | Runtime | Mean | Median | Allocated |
---|---|---|---|---|
Regex | .NET Framework 4.8 | 165.4417 ns | 166.2537 ns | - |
CompiledRegex | .NET Framework 4.8 | 115.9377 ns | 115.9720 ns | - |
Regex | .NET Core 3.1 | 118.1540 ns | 118.1887 ns | - |
CompiledRegex | .NET Core 3.1 | 89.7392 ns | 89.6514 ns | - |
Regex | .NET 6.0 | 57.8247 ns | 57.8031 ns | - |
CompiledRegex | .NET 6.0 | 21.2952 ns | 21.2616 ns | - |
Regex | .NET 9.0 | 47.2579 ns | 47.3506 ns | - |
CompiledRegex | .NET 9.0 | 24.2419 ns | 24.2547 ns | - |
The advantage of using compiled expressions is quite obvious. Also, with each new version of .NET, the contribution of developers to performance is more visible. This is another argument in favor of upgrading to modern versions of the .NET Framework.
Regex source generators
Again, compiling regular expressions has one drawback - creating a Regex
object at runtime will take some time. Is it
possible to get rid of this?
Starting with .NET 7, this capability is available through source generators. Strictly speaking, source generators appeared in .NET 5, but the regular expression solution was implemented only in the seventh version.
Source generators allow you to create C# code at compile time, which means you can view and debug it as if it was
your own code.
And regular expressions can be converted to C# code at compile time! .NET has a special attribute for this -
GeneratedRegex
:
namespace DigitBenchmark
{
public partial class DigitsBenchmarks
{
private static readonly Regex generatedRegex = GenerateRegex();
[GeneratedRegex("^[0-9]*$")]
private static partial Regex GenerateRegex();
}
}
Let's figure out what's going on here.
First, we need to mark our DigitsBenchmarks
class as partial
, since some of the generated code for this class will
be in another file.
Next we need to create a partial
method that will return an object of type Regex
and mark it with the
GeneratedRegex
attribute specifying the regular expression pattern.
You don't need to specify the RegexOptions.Compiled
option, it will be ignored.
The implementation of the GenerateRegex
method will be in another file. You can find it in the project and view the
source code:
namespace DigitBenchmark
{
partial class DigitsBenchmarks
{
/// <remarks>
/// Pattern:<br/>
/// <code>^[0-9]*$</code><br/>
/// Explanation:<br/>
/// <code>
/// ○ Match if at the beginning of the string.<br/>
/// ○ Match a character in the set [0-9] atomically any number of times.<br/>
/// ○ Match if at the end of the string or if before an ending newline.<br/>
/// </code>
/// </remarks>
[global::System.CodeDom.Compiler.GeneratedCodeAttribute("System.Text.RegularExpressions.Generator", "8.0.12.21506")]
private static partial global::System.Text.RegularExpressions.Regex GenerateRegex() => global::System.Text.RegularExpressions.Generated.GenerateRegex_0.Instance;
}
}
As you can see, the file with the same class was automatically created and contains the implementation of our method for generating a regular expression.
And then we can use this Regex
object as usual. Since this is real C# code, there is no need to generate anything at
runtime, so there is no point in specifying the RegexOptions.Compiled
option.
What benefit do we get from this? My benchmarks do not include .NET 7 and 8, let's compare the performance of the latest one at the moment:
Method | Runtime | Mean | Median | Allocated |
---|---|---|---|---|
Regex | .NET 9.0 | 47.2579 ns | 47.3506 ns | - |
CompiledRegex | .NET 9.0 | 24.2419 ns | 24.2547 ns | - |
GeneratedRegex | .NET 9.0 | 17.2548 ns | 17.2603 ns | - |
We see that the time has been reduced by almost 30%! The compiler has much more opportunity to optimize the source code at the compilation stage than at runtime.
char.IsDigit
Another popular way is to use the static char.IsDigit
method in combination with the LINQ All
method:
var value = "123456789000";
var isValid = value.All(char.IsDigit);
Let's check how good this method is in terms of performance.
Benchmark code. Click to expand.
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net90)]
[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
[SimpleJob(RuntimeMoniker.Net48)]
[HideColumns("Job", "Error", "StdDev", "Gen0")]
public partial class DigitsBenchmarks
{
[Benchmark]
public bool LinqCharIsDigit()
{
return value.All(char.IsDigit);
}
}
Results:
Method | Runtime | Mean | Median | Allocated |
---|---|---|---|---|
LinqCharIsDigit | .NET Framework 4.8 | 92.1679 ns | 92.2549 ns | 96 B |
LinqCharIsDigit | .NET Core 3.1 | 72.0987 ns | 72.6419 ns | 96 B |
LinqCharIsDigit | .NET 6.0 | 74.2609 ns | 74.4256 ns | 96 B |
LinqCharIsDigit | .NET 9.0 | 31.0294 ns | 31.0501 ns | 32 B |
And let's compare this method with previous solutions.
If in older versions of the framework, LINQ validation and the IsDigit
method has an advantage over regular
expressions, then later we see that such an implementation starts to lose.
Also, please note that each call results in the allocation of additional memory. This value consists of two parts:
- Creating a lambda expression in a method parameter
All(c => char.IsDigit(c))
. - Creating an iterator inside the
All
method.
It is noteworthy that .NET 9 allocates three times less memory.
Before .NET 9, the All
method consisted of a foreach
loop with a condition:
public static bool All<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
//...
foreach (TSource source1 in source)
{
if (!predicate(source1))
return false;
}
return true;
}
But in .NET 9, an important optimization was added:
public static bool All<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
//...
ReadOnlySpan<TSource> span;
if (source.TryGetSpan<TSource>(out span))
{
ReadOnlySpan<TSource> readOnlySpan = span;
for (int index = 0; index < readOnlySpan.Length; ++index)
{
TSource source1 = readOnlySpan[index];
if (!predicate(source1))
return false;
}
}
else
{
foreach (TSource source2 in source)
{
if (!predicate(source2))
return false;
}
}
return true;
}
Instead of an unconditional foreach
loop, the All
method tries to get a ReadOnlySpan
- a read-safe contiguous
block of memory - from the source.
And then a simple for
loop is used, which does not lead to the creation of an iterator, reducing the amount of
additional memory.
You can get rid of this completely by rewriting the All
method to a regular loop:
public bool ForIsDigit()
{
for (var i = 0; i < value.Length; i++)
{
if (!char.IsDigit(value[i]))
return false;
}
return true;
}
In addition to the absence of unnecessary memory traffic, this solution is very fast.
What is a digit?
It seems that we have found the optimal solution for checking a string for the presence of only digits. But try to guess what the following code will output:
Console.WriteLine(char.IsDigit('0'));
Console.WriteLine(char.IsDigit('a'));
Console.WriteLine(char.IsDigit('٨'));
Console.WriteLine(char.IsDigit('৯'));
Click to find out the answer.
Console.WriteLine(char.IsDigit('0')); //True
Console.WriteLine(char.IsDigit('a')); //False
Console.WriteLine(char.IsDigit('٨')); //True
Console.WriteLine(char.IsDigit('৯')); //True
I think the result surprises you. But there is nothing unusual about it, the IsDigit
method considers as
digits not only the usual symbols from the set 0-9
, but also all other symbols that are related to digits in the
Unicode encoding. And there are actually a lot of them.
This can be a problem if you rely on such a check in your business code.
I think this is the reason for the new char.IsAsciiDigit
method to appear starting with .NET 7. Now it really only
checks characters from the 0-9
set.
Its implementation is very similar to manually checking each character in the loop, let's compare both solutions:
[Benchmark]
public bool ForCompare()
{
for (var i = 0; i < value.Length; i++)
{
if (value[i] < '0' || value[i] > '9')
return false;
}
return true;
}
[Benchmark]
public bool ForIsAsciiDigit()
{
for (var i = 0; i < value.Length; i++)
{
if (!char.IsAsciiDigit(value[i]))
return false;
}
return true;
}
Method | Runtime | Mean | Median | Allocated |
---|---|---|---|---|
ForCompare | .NET 9.0 | 4.8587 ns | 4.8656 ns | - |
ForIsAsciiDigit | .NET 9.0 | 4.7515 ns | 4.4976 ns | - |
Both methods show equivalent results.
Conclusion
We've looked at several different ways to check whether a string consists only of digits or not. And it's important to understand the specifics of how some of them work, because in addition to trivial performance issues, you can get errors in your business logic if you don't check your input data well enough.
Recommendations:
- If you write your applications for .NET 7 or higher, then use the generated regular expressions. Otherwise, specify
the
RegexOptions.Compiled
option. - If you write your applications for .NET 7, use the
char.IsAsciiDigit
method to check characters. Otherwise, it is better to write the check yourself.
Links
Full benchmark code. Click to expand.
using System.Linq;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
namespace DigitBenchmark
{
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net90)]
[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
[SimpleJob(RuntimeMoniker.Net48)]
[HideColumns("Job", "Error", "StdDev", "Gen0")]
public partial class DigitsBenchmarks
{
private static string value = "123456789000";
private static readonly Regex regex = new Regex("^[0-9]*$");
private static readonly Regex compiledRegex = new Regex("^[0-9]*$", RegexOptions.Compiled);
private static readonly Regex generatedRegex = GenerateRegex();
[GeneratedRegex("^[0-9]*$")]
private static partial Regex GenerateRegex();
[Benchmark]
public bool Regex()
{
return regex.IsMatch(value);
}
[Benchmark]
public bool CompiledRegex()
{
return compiledRegex.IsMatch(value);
}
[Benchmark]
public bool GeneratedRegex()
{
return generatedRegex.IsMatch(value);
}
[Benchmark]
public bool LinqCharIsDigit()
{
return value.All(char.IsDigit);
}
[Benchmark]
public bool ForCompare()
{
for (var i = 0; i < value.Length; i++)
{
if (value[i] < '0' || value[i] > '9')
return false;
}
return true;
}
[Benchmark]
public bool ForIsDigit()
{
for (var i = 0; i < value.Length; i++)
{
if (!char.IsDigit(value[i]))
return false;
}
return true;
}
[Benchmark]
public bool ForIsAsciiDigit()
{
for (var i = 0; i < value.Length; i++)
{
if (!char.IsAsciiDigit(value[i]))
return false;
}
return true;
}
[Benchmark]
public bool LinqCharIsAsciiDigit()
{
return value.All(char.IsAsciiDigit);
}
}
}
Full results. Click to expand.
BenchmarkDotNet v0.15.0, Windows 10 (10.0.19045.5917/22H2/2022Update)
AMD Ryzen 7 7840H with Radeon 780M Graphics 3.80GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.204
[Host] : .NET 9.0.5 (9.0.525.21509), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
.NET 6.0 : .NET 6.0.36 (6.0.3624.51421), X64 RyuJIT AVX2
.NET 9.0 : .NET 9.0.5 (9.0.525.21509), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
.NET Core 3.1 : .NET Core 3.1.32 (CoreCLR 4.700.22.55902, CoreFX 4.700.22.56512), X64 RyuJIT AVX2
.NET Framework 4.8 : .NET Framework 4.8.1 (4.8.9310.0), X64 RyuJIT VectorSize=256
Method | Runtime | Mean | Median | Allocated |
---|---|---|---|---|
Regex | .NET Framework 4.8 | 165.4417 ns | 166.2537 ns | - |
CompiledRegex | .NET Framework 4.8 | 115.9377 ns | 115.9720 ns | - |
GeneratedRegex | .NET Framework 4.8 | N/A | N/A | - |
LinqCharIsDigit | .NET Framework 4.8 | 92.1679 ns | 92.2549 ns | 96 B |
ForCompare | .NET Framework 4.8 | 4.8703 ns | 4.8602 ns | - |
ForIsDigit | .NET Framework 4.8 | 8.3920 ns | 8.3952 ns | - |
ForIsAsciiDigit | .NET Framework 4.8 | N/A | N/A | - |
LinqCharIsAsciiDigit | .NET Framework 4.8 | N/A | N/A | - |
Regex | .NET Core 3.1 | 118.1540 ns | 118.1887 ns | - |
CompiledRegex | .NET Core 3.1 | 89.7392 ns | 89.6514 ns | - |
GeneratedRegex | .NET Core 3.1 | N/A | N/A | - |
LinqCharIsDigit | .NET Core 3.1 | 72.0987 ns | 72.6419 ns | 96 B |
ForCompare | .NET Core 3.1 | 5.3070 ns | 5.3077 ns | - |
ForIsDigit | .NET Core 3.1 | 9.0998 ns | 9.1106 ns | - |
ForIsAsciiDigit | .NET Core 3.1 | N/A | N/A | - |
LinqCharIsAsciiDigit | .NET Core 3.1 | N/A | N/A | - |
Regex | .NET 6.0 | 57.8247 ns | 57.8031 ns | - |
CompiledRegex | .NET 6.0 | 21.2952 ns | 21.2616 ns | - |
GeneratedRegex | .NET 6.0 | N/A | N/A | - |
LinqCharIsDigit | .NET 6.0 | 74.2609 ns | 74.4256 ns | 96 B |
ForCompare | .NET 6.0 | 5.6198 ns | 5.5922 ns | - |
ForIsDigit | .NET 6.0 | 10.3160 ns | 10.1789 ns | - |
ForIsAsciiDigit | .NET 6.0 | N/A | N/A | - |
LinqCharIsAsciiDigit | .NET 6.0 | N/A | N/A | - |
Regex | .NET 9.0 | 47.2579 ns | 47.3506 ns | - |
CompiledRegex | .NET 9.0 | 24.2419 ns | 24.2547 ns | - |
GeneratedRegex | .NET 9.0 | 17.2548 ns | 17.2603 ns | - |
LinqCharIsDigit | .NET 9.0 | 31.0294 ns | 31.0501 ns | 32 B |
ForCompare | .NET 9.0 | 4.8587 ns | 4.8656 ns | - |
ForIsDigit | .NET 9.0 | 7.1207 ns | 7.7792 ns | - |
ForIsAsciiDigit | .NET 9.0 | 4.7515 ns | 4.4976 ns | - |
LinqCharIsAsciiDigit | .NET 9.0 | 32.6861 ns | 32.5483 ns | 32 B |
- The
N/A
mark is given to methods that do not exist in the current version of the framework.