SimdUnicode
A blazing-fast C# library that validates UTF-8 strings up to 13× faster than the .NET standard library — using AVX-512, AVX2, SSE and ARM NEON.
Drop-in replacement
SimdUnicode provides SimdUnicode.UTF8.GetPointerToFirstInvalidByte, a faster drop-in replacement for the runtime's private Utf8Utility.GetPointerToFirstInvalidByte. It returns a pointer to the first invalid byte — or the end of the buffer when the input is well-formed.
using SimdUnicode;
byte[] data = File.ReadAllBytes("twitter.json");
unsafe
{
fixed (byte* p = data)
{
byte* invalid = UTF8.GetPointerToFirstInvalidByte(
p, data.Length,
out int utf16Adjustment,
out int scalarAdjustment);
bool isValid = invalid == p + data.Length;
Console.WriteLine(isValid ? "Valid UTF-8 ✅" : $"Invalid at offset {invalid - p}");
}
}
The right SIMD kernel is selected automatically at runtime: ARM64 NEON, AVX-512 (Zen 4 / Ice Lake), AVX2, SSE4.2, or a portable scalar fallback.
Less than one instruction per byte
Implements the Keiser–Lemire algorithm used by Node.js, Bun, Oracle GraalVM and the PHP interpreter.
Runtime dispatch
One call, the best available kernel. AVX-512, AVX2, SSE4.2, ARM NEON or scalar — chosen for your CPU.
Extensively tested
A large suite of correctness tests across architectures, plus reproducible BenchmarkDotNet benchmarks.
x64 & ARM
First-class support for modern Intel/AMD and Apple Silicon / Graviton processors.
How fast is it?
Throughput on an Intel Ice Lake system (AVX-512), validating UTF-8. Longer bars are faster — SimdUnicode in purple, the .NET standard library in grey.
On an Apple M2 (NEON), SimdUnicode is 1.5×–4× faster than the standard library. See the full set of measurements across x64 and ARM in the benchmarks.
Build it
git clone https://github.com/simdutf/SimdUnicode.git
cd SimdUnicode/src
dotnet build -c Release
Then add a project reference to src/SimdUnicode.csproj. Head to the getting started guide or dive into the API reference.
Citing
The algorithm is described in:
John Keiser, Daniel Lemire, Validating UTF-8 In Less Than One Instruction Per Byte, Software: Practice and Experience 51 (5), 2021.