Visual Studio Code - Local variables performance vs global in example

Igor
September 18, 2024
193 views
1 vote
2 Answers

I study performance of C code. OS is Windows 10.
Suddenly I found that the same code runs 2x faster if I use local variables against the same code with global variables.
I can’t understand why. Tested both in Visual Studio and GCC. Same performance difference.
The test results:

fun_b
4254600908
Duration:  5.168000 sec
fun_a
4254600908
Duration:  18.455000 sec

The sample code:

#include <stdio.h>
#include <time.h>
unsigned int result;
unsigned int mask;
unsigned int i;
unsigned int fun_a(unsigned int value)
{
    result = 0;
    mask = 1;
    for (i = 0; i < 32; i++)
    {
        if (value & mask) result++;
        mask <<= 1;
    }
    return result;
}
unsigned int fun_b(unsigned int value)
{
    unsigned int result = 0;
    unsigned int mask = 1;
    for (unsigned int i = 0; i < 32; i++)
    {
        if (value & mask) result++;
        mask <<= 1;
    }
    return result;
}
int main(void)
{
    unsigned int N = 0x12345678;
    clock_t startClock, finishClock;
    double f;
    unsigned int A;
    
    ////////////////////////////////
    printf("fun_bn");
    startClock = clock();
    A = 0;
    for (unsigned int i = 0; i < N; i++)
    {
        A += fun_b(i);
    }
    finishClock = clock();
    printf("%un", A);
    f = (double)(finishClock - startClock) / (double)(CLOCKS_PER_SEC);
    printf("Duration:  %f secn", f);

    ////////////////////////////////
    printf("fun_an");
    startClock = clock();
    A = 0;
    for (unsigned int i = 0; i < N; i++)
    {
        A += fun_a(i);
    }
    finishClock = clock();
    printf("%un", A);
    f = (double)(finishClock - startClock) / (double)(CLOCKS_PER_SEC);
    printf("Duration:  %f secn", f);
}

I expected fun_a and fun_b should take the same time to execute and I can’t understand why fun_b runs twice as fast.

Tags: c#performance

Answers

- MikeNakis
- September 18, 2024 at 3:22 pm
- 0 votes
0
By feeding it to "Compiler Explorer" with gcc and full optimizations:

https://godbolt.org/z/vhcWT1zfv

As it turns out, gcc refuses to optimize the access to the global variables, whereas it does, of course, optimize the bejeezus out of the local variables. I am actually surprised it is only twice as fast. (I suppose multi-level CPU cache is to thank for that.)

What do we learn from this?

What we have known all along:

Do not. use. global. variables.

(That was just humor. The reasons to never use global variables have nothing to do with performance. You should not use them even if they were somehow faster.)

Login or Signup to reply.

- teapot418
- September 18, 2024 at 3:29 pm
- 0 votes
0
When you look at the disassembly (gcc, -O3):
```
fun_b(unsigned int):
        mov     edx, 32
        mov     eax, 1
        xor     ecx, ecx
.L15:
        mov     esi, edi
        and     esi, eax
        cmp     esi, 1
        sbb     ecx, -1
        add     eax, eax
        sub     edx, 1
        jne     .L15
        mov     eax, ecx
        ret
```
fun_b just does stuff with registers. This is reasonably fast.

fun_a has to also populate the global variables with the right values because they are part of the function’s observable side effects.
```
fun_a(unsigned int):
        mov     eax, 32
        xor     esi, esi
        xor     ecx, ecx
        mov     edx, 1
        mov     DWORD PTR result[rip], 0
.L3:
        test    edx, edi
        je      .L2
        add     ecx, 1
        mov     esi, 1
.L2:
        add     edx, edx
        sub     eax, 1
        jne     .L3
        mov     DWORD PTR mask[rip], edx
        mov     DWORD PTR i[rip], 32
        test    sil, sil
        je      .L1
        mov     DWORD PTR result[rip], ecx
        mov     eax, ecx
.L1:
        ret
```
which is that much slower for a small function like this. Note: it’s only doing the final values, you will not be able to observe i and mask changing during the loop. And it’s still that much of a difference.

Note: actually it is slightly more complicated as the compiler has inlined the functions and is considering the entire sum at once. Having to compute the values for the global variables has prevented it from optimizing it as well as it could.

If you want to benchmark the functions themselves, you would have to prevent inlining by e.g. calling them through function pointers like this: https://godbolt.org/z/v58bqveTc — then the above explanation applies and the difference becomes even greater somehow.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Visual Studio Code – Local variables performance vs global in example

Answers