Visual Studio Code - In C: Created a node, forgot return the pointer to the node, but code still runs as if I returned it

NoobProgrammer
July 2, 2024
137 views
1 vote
2 Answers

I was implementing some data structures for fun in C, and I was comparing the speed to other data structures I implemented. This one is a closed addressing hash table.

This is the code I used to create a node

hashNode *createNewNode(int data) {
    hashNode *node = calloc(1, sizeof(hashNode));
    node->data.key = data;
    node->isSet = true;
    node->next = NULL;
}

This is the function I wanted to time.

for (int i = 0; i < 5; i++) {
    hashNode *node = createNewNode(arr[i]);
    InsertNode(node, map);
}

(arr is just the first 5000 numbers shuffled)
As you may have noticed the function to create a node doesn’t have a return value, still despite this the node is initialized correctly and all the numbers that were supposed to be in the table were inserted, and only those. How could that happen?

This gimmik only works in VS Code, I tried running it in Visual Studio but it (correctly) doesn’t initialize the node. Does anyone know what is happening?

Edit: okay, I probably didn’t express myself correctly I’m sorry. My question is, how does it work? I know it’s undefined behaviour, but it doesn’t look like something that is supposed to work, yet it worked correctly 5000 times out of 5000 and even if I put some printf here and there it keeps working correctly

Tags: c#data-structures pointers

Answers

- dbush
- June 30, 2024 at 5:19 pm
- 0 votes
0
If you fail to return a value from a function that is defined to do so, and you then attempt to use the function’s return value, this triggers undefined behavior in your code.

With undefined behavior, there are no guarantees regarding what your code will do. It could crash, it could output strange results, or is could (as in your case) appear to work properly.

Additionally, making seemingly unrelated changes to your code (such as adding an unused local variable or a call to printf for debugging) can change how undefined behavior manifests. Compiling with different optimization settings or with a different compiler can also yield differences.

What might be happening in your case is that the value of node could be sitting in a register, and that register just happens to be the one where the function’s return value would be placed. But again, it’s essentially luck that it’s working this way.

Login or Signup to reply.

- Joel
- June 30, 2024 at 9:05 pm
- 0 votes
0
As @dbush‘s answer already explains, not returning a value in a non-void function but using the value of the function call is UB (undefined behavior).

From this C23 standard draft §6.9.2, point 13:

Unless otherwise specified, if the } that terminates the function body is reached, and the value of the
function call is used by the caller, the behavior is undefined.

That calling your createNewNode() function and using its return value (if you can call it that since it is not returning) worked out as if you had returned the allocated node is pure luck (or unluckiness, depending on how you see it).

When you encounter undefined behavior, you can’t really rely on it or reason about it. If you change the compiler or just its version or some compiler flags like optimization level or anything in your code, even if it doesn’t directly relate to the source of the UB, your code might behave differently each time.

Even though you probably shouldn’t, I still tried to reason about what was going on in your code and @dbush already somewhat explained that, too.

the value of node could be sitting in a register

Using Compiler Explorer I compiled some simplified but similar code using x86-64 gcc 14.1 (live example) without setting any compiler flags.
```
#include <stdio.h>
#include <stdlib.h>

int* gPtr = NULL;

int* allocateInt() {
    int* p = calloc(1, sizeof *p);
    gPtr = p;
}

int main() {
    int* ptr = allocateInt();

    printf("ptr:  %pn", ptr);
    printf("gPtr: %pn", gPtr);

    free(ptr);
    return 0;
}
```
This is its generated assembly:
```
gPtr:
        .zero   8
allocateInt:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     esi, 4
        mov     edi, 1
        call    calloc
        mov     QWORD PTR [rbp-8], rax
        mov     rax, QWORD PTR [rbp-8]
        mov     QWORD PTR gPtr[rip], rax
        nop
        leave
        ret
.LC0:
        .string "ptr:  %pn"
.LC1:
        .string "gPtr: %pn"
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     eax, 0
        call    allocateInt
        mov     QWORD PTR [rbp-8], rax
        mov     rax, QWORD PTR [rbp-8]
        mov     rsi, rax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     rax, QWORD PTR gPtr[rip]
        mov     rsi, rax
        mov     edi, OFFSET FLAT:.LC1
        mov     eax, 0
        call    printf
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    free
        mov     eax, 0
        leave
        ret
```
Some of this assembly code might look scary but the for us important parts are not that hard to understand if you know what you have to look for.

Program execution in C starts from the main() function, in the assembly marked with the main: label. After setting up the stack frame we encounter call allocateInt and you can probably guess by yourself what this does. In allocateInt() we have a call calloc.

The return value of calloc() is stored in the 64-bit register rax. x86-64 calling conventions:

Integer return values up to 64 bits in size are stored in RAX

We now have the following three lines:
```
mov     QWORD PTR [rbp-8], rax
mov     rax, QWORD PTR [rbp-8]
mov     QWORD PTR gPtr[rip], rax
```
The first one stores the value in the rax register on the stack (in allocateInt()‘s local p). The next line stores the value in p back into rax. Then we store the value in rax into our global gPtr.

The next usage of rax is directly after the call to allocateInt().
```
mov     QWORD PTR [rbp-8], rax
```
You should already be able to tell what this line does. It stores the value in rax on the stack. Since we are in main() again and allocateInt()‘s stack has already been teared down this now stores it in main()‘s ptr.

Therefore the program had the following output for me, which shows that allocateInt() seemingly returned the correct value, even though we didn’t write a return statement ourselves:
```
ptr:  0x20d22a0
gPtr: 0x20d22a0
```
As I already said, modifying even a little bit can change the behavior of the program, since we have UB in our code. The following modifications I did show that you can’t rely on UB.

When I turned up the optimization level to -O2 or -O3, the output differed. However -O1 and -Os still managed to return the pointer.

Originally I had a call to printf() in the allocateInt() function, but this caused ptr not having the same value as p. Moving that printf() call into main() and having a global gPtr got me the result which I have shown here.

On some compilers my code caused a segfault.
Login or Signup to reply.