I am running a simple benchmarking exercise of 2d array access of reading and writing.
I have const int XDim = 1000;
and const int YDim = 1000;
and create a 1000 x 1000
matrix.
In C-style, I initialize using
double* doubleArray[XDim];
for (int i = 0; i < XDim; i++)
doubleArray[i] = (double*)malloc(YDim * sizeof(double));
for (int i = 0; i < XDim; i++)
for (int j = 0; j < YDim; j++)
doubleArray[i][j] = 0.;
In C++ std::vector<std::vector<double>>
I use:
std::vector<std::vector<double>> doubleArray;
std::vector<double> vecdbl;
for (int i = 0; i < XDim; i++)
doubleArray.push_back(vecdbl);
for (int i = 0; i < XDim; i++)
for (int j = 0; j < YDim; j++)
doubleArray[i].push_back(0.);
In boost style, I use:
typedef boost::multi_array<double, 2> OuterArray;
typedef boost::array<OuterArray::index, 2> OuterArrayIndices;
OuterArray::index rows = XDim;
OuterArray::index cols = YDim;
OuterArrayIndices shape = { rows, cols };
OuterArray doubleArray(shape);
for (int i = 0; i < XDim; i++)
for (int j = 0; j < YDim; j++)
doubleArray[i][j] = 0.;
Then, I randomly pick 999999
elements from doubleArray[][]
and set them to a random integer value chosen in [-5, 5]
. Then, I further pick randomly 999999
elements, add them up, then find their avarage. The average is found and the time taken for all of this is displayed. This exercise is repeated many times.
The Godbolt link that does all this is available over here
On the above link, Godbolt seems to only handle const int total_iterations = 2;
. Larger values give an error that the time has exceeded. On running this on my computer with const int total_iterations = 100;
, a consistent pattern I observe is that the boost method is unable to match the C method of malloc
or the C++ method of std::vector<std::vector<double>>
. Between the latter two, mostly, the C method beats the C++ method while there are a handful of instances of the reverse. All of this is compiled and run in Visual Studio in Release mode using default settings.
I am unable to see the value of boost::multi_array
given this over raw C pointer of pointer or std::vector<std::vector<double>>
. Is there a way boost::multi_array
can be made faster?
ETA: Based on comments, with sequential access to array elements, and release mode settings on Godbolt, boost::multi_array
wins majority of the times. See here 9 out of 10 times.
With nonsequential, random access to elements, and release mode settings on Godbolt, it is close, with C style slightly better than boost::multi_array
. See here
Thanks to tstanisl in the comments, for sequential access, C style seems to be faster in 10 out of 10 times. His link is accessible here
There is an overall caveat pointed out by user n-m-could-be-an-ai that this profiling could be profiling the random number generator. Controlling for that seems to be beyond the purview of this question. However, it appears that regardless, since the same number of RNGs are generated for each case, and since the number of iterations is also not 1 or 2, but larger, that could be a wash. I could be wrong.
ETA(2): Based on answer provided by chqrlie, I have modified the code which is available here. I have commented it hopefully sufficiently to see what is going on. It is still not fully clear to me which way is faster, but perhaps someone else can use this code as a starting point to better benchmark these various data structures.
2
Answers
I’m afraid your benchmark has serious flaws that make the results almost meaningless:
-O2
or-O3
in the command option field of the compilation window after uncovering it.Here is a modified version with an additional matrix implementation using C99 VLAs, which are supported by gcc: https://godbolt.org/z/v5o8P5M6d
Try it on a standalone idle computer to get meaningful results.
You wouldn’t expect to see much gains as the access patterns don’t afford much gains.
However you can separate the scenario building from the perf measurements, and then do that properly, so you get:
Live On Compiler Explorer
The output being something like