I study performance of C code. OS is Windows 10.
Suddenly I found that the same code runs 2x faster if I use local variables against the same code with global variables.
I can’t understand why. Tested both in Visual Studio and GCC. Same performance difference.
The test results:
fun_b
4254600908
Duration: 5.168000 sec
fun_a
4254600908
Duration: 18.455000 sec
The sample code:
#include <stdio.h>
#include <time.h>
unsigned int result;
unsigned int mask;
unsigned int i;
unsigned int fun_a(unsigned int value)
{
result = 0;
mask = 1;
for (i = 0; i < 32; i++)
{
if (value & mask) result++;
mask <<= 1;
}
return result;
}
unsigned int fun_b(unsigned int value)
{
unsigned int result = 0;
unsigned int mask = 1;
for (unsigned int i = 0; i < 32; i++)
{
if (value & mask) result++;
mask <<= 1;
}
return result;
}
int main(void)
{
unsigned int N = 0x12345678;
clock_t startClock, finishClock;
double f;
unsigned int A;
////////////////////////////////
printf("fun_bn");
startClock = clock();
A = 0;
for (unsigned int i = 0; i < N; i++)
{
A += fun_b(i);
}
finishClock = clock();
printf("%un", A);
f = (double)(finishClock - startClock) / (double)(CLOCKS_PER_SEC);
printf("Duration: %f secn", f);
////////////////////////////////
printf("fun_an");
startClock = clock();
A = 0;
for (unsigned int i = 0; i < N; i++)
{
A += fun_a(i);
}
finishClock = clock();
printf("%un", A);
f = (double)(finishClock - startClock) / (double)(CLOCKS_PER_SEC);
printf("Duration: %f secn", f);
}
I expected fun_a and fun_b should take the same time to execute and I can’t understand why fun_b runs twice as fast.
2
Answers
By feeding it to "Compiler Explorer" with gcc and full optimizations:
https://godbolt.org/z/vhcWT1zfv
As it turns out, gcc refuses to optimize the access to the global variables, whereas it does, of course, optimize the bejeezus out of the local variables. I am actually surprised it is only twice as fast. (I suppose multi-level CPU cache is to thank for that.)
What do we learn from this?
What we have known all along:
Do not. use. global. variables.
(That was just humor. The reasons to never use global variables have nothing to do with performance. You should not use them even if they were somehow faster.)
When you look at the disassembly (gcc, -O3):
fun_b
just does stuff with registers. This is reasonably fast.fun_a
has to also populate the global variables with the right values because they are part of the function’s observable side effects.which is that much slower for a small function like this. Note: it’s only doing the final values, you will not be able to observe
i
andmask
changing during the loop. And it’s still that much of a difference.Note: actually it is slightly more complicated as the compiler has inlined the functions and is considering the entire sum at once. Having to compute the values for the global variables has prevented it from optimizing it as well as it could.
If you want to benchmark the functions themselves, you would have to prevent inlining by e.g. calling them through function pointers like this: https://godbolt.org/z/v58bqveTc — then the above explanation applies and the difference becomes even greater somehow.