I was implementing some data structures for fun in C, and I was comparing the speed to other data structures I implemented. This one is a closed addressing hash table.
This is the code I used to create a node
hashNode *createNewNode(int data) {
hashNode *node = calloc(1, sizeof(hashNode));
node->data.key = data;
node->isSet = true;
node->next = NULL;
}
This is the function I wanted to time.
for (int i = 0; i < 5; i++) {
hashNode *node = createNewNode(arr[i]);
InsertNode(node, map);
}
(arr is just the first 5000 numbers shuffled)
As you may have noticed the function to create a node doesn’t have a return value, still despite this the node is initialized correctly and all the numbers that were supposed to be in the table were inserted, and only those. How could that happen?
This gimmik only works in VS Code, I tried running it in Visual Studio but it (correctly) doesn’t initialize the node. Does anyone know what is happening?
Edit: okay, I probably didn’t express myself correctly I’m sorry. My question is, how does it work? I know it’s undefined behaviour, but it doesn’t look like something that is supposed to work, yet it worked correctly 5000 times out of 5000 and even if I put some printf here and there it keeps working correctly
2
Answers
If you fail to return a value from a function that is defined to do so, and you then attempt to use the function’s return value, this triggers undefined behavior in your code.
With undefined behavior, there are no guarantees regarding what your code will do. It could crash, it could output strange results, or is could (as in your case) appear to work properly.
Additionally, making seemingly unrelated changes to your code (such as adding an unused local variable or a call to
printf
for debugging) can change how undefined behavior manifests. Compiling with different optimization settings or with a different compiler can also yield differences.What might be happening in your case is that the value of
node
could be sitting in a register, and that register just happens to be the one where the function’s return value would be placed. But again, it’s essentially luck that it’s working this way.As @dbush‘s answer already explains, not returning a value in a non-void function but using the value of the function call is UB (undefined behavior).
From this C23 standard draft §6.9.2, point 13:
That calling your
createNewNode()
function and using its return value (if you can call it that since it is not returning) worked out as if you had returned the allocated node is pure luck (or unluckiness, depending on how you see it).When you encounter undefined behavior, you can’t really rely on it or reason about it. If you change the compiler or just its version or some compiler flags like optimization level or anything in your code, even if it doesn’t directly relate to the source of the UB, your code might behave differently each time.
Even though you probably shouldn’t, I still tried to reason about what was going on in your code and @dbush already somewhat explained that, too.
Using Compiler Explorer I compiled some simplified but similar code using x86-64 gcc 14.1 (live example) without setting any compiler flags.
This is its generated assembly:
Some of this assembly code might look scary but the for us important parts are not that hard to understand if you know what you have to look for.
Program execution in C starts from the
main()
function, in the assembly marked with themain:
label. After setting up the stack frame we encountercall allocateInt
and you can probably guess by yourself what this does. InallocateInt()
we have acall calloc
.The return value of
calloc()
is stored in the 64-bit registerrax
. x86-64 calling conventions:We now have the following three lines:
The first one stores the value in the
rax
register on the stack (inallocateInt()
‘s localp
). The next line stores the value inp
back intorax
. Then we store the value inrax
into our globalgPtr
.The next usage of
rax
is directly after the call toallocateInt()
.You should already be able to tell what this line does. It stores the value in
rax
on the stack. Since we are inmain()
again andallocateInt()
‘s stack has already been teared down this now stores it inmain()
‘sptr
.Therefore the program had the following output for me, which shows that
allocateInt()
seemingly returned the correct value, even though we didn’t write areturn
statement ourselves:As I already said, modifying even a little bit can change the behavior of the program, since we have UB in our code. The following modifications I did show that you can’t rely on UB.
When I turned up the optimization level to
-O2
or-O3
, the output differed. However-O1
and-Os
still managed to return the pointer.Originally I had a call to
printf()
in theallocateInt()
function, but this causedptr
not having the same value asp
. Moving thatprintf()
call intomain()
and having a globalgPtr
got me the result which I have shown here.On some compilers my code caused a segfault.