I’m writing code that will subtract corresponding bytes in two arrays and count the number of resulting bytes surpassing a given threshold. AFAIU, it would really benefit from .NET SIMD, but System.Numerics.Vector.IsHardwareAccelerated
returns false when I compile C# on Raspberry Pi 4.
My dotnet
version is 3.1.406, I’ve added
<PropertyGroup>
<Optimize>true</Optimize>
</PropertyGroup>
to the csproj and running release
configuration.
Is there any way I can leverage SIMD support in .NET on Raspberry Pi 4? Maybe with .NET 5?
Update
I installed .NET 5 and tried .NET Intrinsics, but none is supported:
Console.WriteLine(System.Runtime.Intrinsics.Arm.AdvSimd.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Aes.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.ArmBase.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Crc32.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Dp.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Rdm.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Sha1.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Sha256.IsSupported); //false
I’m on 32-bit Raspbian (Debian derivative), is there any chance I need 64-bit version for this to work?
P.S. To clarify, in plain C# the algorhytm looks like this:
public static int ScalarTest(byte[] lhs, byte[] rhs)
{
var result = 0;
for (int index = 0; index < lhs.Length; index++)
{
var a = lhs[index];
var b = rhs[index];
if (b > a)
{
(b, a) = (a, b);
}
result += ((a - b) >= 16) ? 1 : 0;
}
return result;
}
2
Answers
Following @Soonts answer, after switching to 64bit Raspbian, here is what I got in NET 5. Most of the instructions I'm looking for are supported.
After implementing the algorhytm which compares two byte arrays for elements with abs. difference exceeding certain threshold, on my Pi 4 I got following benchmark measurements (average of 3runs post warmup):
C# Loop:
59ms
System.Numerics.Vector
:21ms
System.Runtime.Intrinsics.Arm.AdvSimd
:17ms
System.Runtime.Intrinsics.Arm.AdvSimd
with optimized vector creation from https://gist.github.com/IKoshelev/325f0e10bee0806d7bb2c9d63d09ba9e2ms !!!
Despite the API is done and even documented, the implementation is missing. Take a look. 8-byte SIMD vectors is essential part of NEON ISA for decades now (was introduced in 2005), yet the .NET runtime only implements them when compiling for ARM64 (released in 2013).
I don’t work for Microsoft and have no idea how exactly they compile their binaries, but the source code tells they have at least some support for NEON when building for ARM64 target. If you want these intrinsics in .NET, you can try the 64-bit OS.
There’s a workaround — implement your performance-critical pieces in C++, compile a shared library for Linux, then use
[DllImport]
to consume these functions from .NET. I have built non-trivial Linux software that way (example), using the following gcc flags to build the DLLs:-march=native -mfpu=neon-fp16 -mfp16-format=ieee -ffast-math -O3 -fPIC
This way it will work for 32-bit OS, and won’t require anything special from .NET runtime, I’ve tested with .NET Core 2.1.