I want to have X amount of std::vectors of equal size, which I can be processed together in a for loop which goes from start to finish in a linear fashion. For example:
for (int i = 0; i < vector_length; i++)
vector1[i] = vector2[i] + vector3[i] * vector4[i];
I want all this to take full advantage of SIMD instructions. For this to happen, the compiler should be able to assume that each of the vectors are aligned optimally for __m256 use. If the compiler can’t assume this, all sorts of non-optimal loops can be generated and used in the code.
How do I ensure this optimal alignment of std::vectors and optimal code generation for such aligned data?
It can be assumed that each vector has identical data structures inside, which can be added/multiplied together using standard SIMD instructions.
I’m using C++17.
MORE INFORMATION AS REQUESTED BY THE PEOPLE HERE:
32 bytes of alignment is good for my use.
I want to get this running on Intel Macs and PCs. (Xcode + Visual Studio) and later on ARM CPU Macs when I get one of those computers (Xcode again).
3
Answers
As couple of people pointed out, there's a related question which can be used to first ensure properly aligned memory owned by the
std::vector
:Modern approach to making std::vector allocate aligned memory
That combined with
__attribute__((aligned(ALIGNMENT_IN_BYTES)))
added to the method parameters (pointers) seems to do the trick. Example:That seems to compile nicely (checked in Godbolt) so the compiler clearly assumes it can simply use large registers to process the data with SIMD instructions.
Thank you everyone!
The only way to control the allocation of std::vector is by replacing the allocator. Boost has an implementation that ensures alignment: https://www.boost.org/doc/libs/1_84_0/doc/html/align/reference.html#align.reference.classes
Is the size of the data known beforehand or are you using any buffers? Cause then you could just us a normal
array
withalignas
.And for using SIMD instruction – you could use
valarray
. That and vector both internally usemalloc
wich in turn is guaranteed to respect the types alignment.So
std::vector<__m256i> mySIMDVector;
is aligned.