Can I use .NET SIMD on Raspberry Pi 4? - Debian

IvanKoshelev
February 24, 2021
241 views
0 votes
2 Answers

I’m writing code that will subtract corresponding bytes in two arrays and count the number of resulting bytes surpassing a given threshold. AFAIU, it would really benefit from .NET SIMD, but System.Numerics.Vector.IsHardwareAccelerated returns false when I compile C# on Raspberry Pi 4.

My dotnet version is 3.1.406, I’ve added

  <PropertyGroup>
    <Optimize>true</Optimize>
  </PropertyGroup>

to the csproj and running release configuration.

Is there any way I can leverage SIMD support in .NET on Raspberry Pi 4? Maybe with .NET 5?

Update
I installed .NET 5 and tried .NET Intrinsics, but none is supported:

Console.WriteLine(System.Runtime.Intrinsics.Arm.AdvSimd.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Aes.IsSupported);  //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.ArmBase.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Crc32.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Dp.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Rdm.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Sha1.IsSupported); //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Sha256.IsSupported); //false

I’m on 32-bit Raspbian (Debian derivative), is there any chance I need 64-bit version for this to work?

P.S. To clarify, in plain C# the algorhytm looks like this:

        public static int ScalarTest(byte[] lhs, byte[] rhs)
        {
            var result = 0;

            for (int index = 0; index < lhs.Length; index++)
            {
                var a = lhs[index];
                var b = rhs[index];
                if (b > a)
                {
                    (b, a) = (a, b);
                }
                result += ((a - b) >= 16) ? 1 : 0;
            }

            return result;
        }

Tags: arm c#neon raspberry-pi simd

Answers

Chosen as BEST ANSWER

Following @Soonts answer, after switching to 64bit Raspbian, here is what I got in NET 5. Most of the instructions I'm looking for are supported.

Console.WriteLine(System.Runtime.InteropServices.RuntimeInformation.OSDescription);
//Linux 5.4.51-v8+ #1333 SMP PREEMPT Mon Aug 10 16:58:35 BST 2020

Console.WriteLine(System.Runtime.InteropServices.RuntimeInformation.ProcessArchitecture);
//Arm64

Console.WriteLine(System.Environment.Is64BitOperatingSystem);           //true

Console.WriteLine(System.Numerics.Vector.IsHardwareAccelerated);        //true
Console.WriteLine(Vector<byte>.Count);                                  //16
Console.WriteLine(Vector<sbyte>.Count);                                 //16
Console.WriteLine(Vector<short>.Count);                                 //8
Console.WriteLine(Vector<ushort>.Count);                                //8
Console.WriteLine(Vector<int>.Count);                                   //4
Console.WriteLine(Vector<uint>.Count);                                  //4
Console.WriteLine(Vector<long>.Count);                                  //2
Console.WriteLine(Vector<ulong>.Count);                                 //2

Console.WriteLine(Vector<float>.Count);                                 //4
Console.WriteLine(Vector<double>.Count);                                //2

Console.WriteLine(System.Runtime.Intrinsics.Arm.AdvSimd.IsSupported);   //true
Console.WriteLine(System.Runtime.Intrinsics.Arm.Aes.IsSupported);       //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.ArmBase.IsSupported);   //true
Console.WriteLine(System.Runtime.Intrinsics.Arm.Crc32.IsSupported);     //true
Console.WriteLine(System.Runtime.Intrinsics.Arm.Dp.IsSupported);        //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Rdm.IsSupported);       //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Sha1.IsSupported);      //false
Console.WriteLine(System.Runtime.Intrinsics.Arm.Sha256.IsSupported);    //false

After implementing the algorhytm which compares two byte arrays for elements with abs. difference exceeding certain threshold, on my Pi 4 I got following benchmark measurements (average of 3runs post warmup):

C# Loop:

59ms

System.Numerics.Vector:

21ms

System.Runtime.Intrinsics.Arm.AdvSimd:

17ms

System.Runtime.Intrinsics.Arm.AdvSimd with optimized vector creation from https://gist.github.com/IKoshelev/325f0e10bee0806d7bb2c9d63d09ba9e

2ms !!!

(Edit)

- Soonts
- February 25, 2021 at 2:26 am
- 0 votes
0
Despite the API is done and even documented, the implementation is missing. Take a look. 8-byte SIMD vectors is essential part of NEON ISA for decades now (was introduced in 2005), yet the .NET runtime only implements them when compiling for ARM64 (released in 2013).

I don’t work for Microsoft and have no idea how exactly they compile their binaries, but the source code tells they have at least some support for NEON when building for ARM64 target. If you want these intrinsics in .NET, you can try the 64-bit OS.

There’s a workaround — implement your performance-critical pieces in C++, compile a shared library for Linux, then use [DllImport] to consume these functions from .NET. I have built non-trivial Linux software that way (example), using the following gcc flags to build the DLLs: -march=native -mfpu=neon-fp16 -mfp16-format=ieee -ffast-math -O3 -fPIC This way it will work for 32-bit OS, and won’t require anything special from .NET runtime, I’ve tested with .NET Core 2.1.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Can I use .NET SIMD on Raspberry Pi 4? – Debian

Answers