Even I get an warning a function returns an address from local variable, it compiles. Isn’t it then UB of compiler? The generated assembly:
.text
.LC0:
.asciz "%in"
.globl foo
.type foo, @function
foo:
pushq %rbp #
movq %rsp, %rbp #,
sub $16, %rsp #,
mov %rdi, -8(%rbp) #,
leaq -8(%rbp), %rax #,
# a.c:5: }
leave
ret
.size foo, .-foo
.globl main
.type main, @function
main:
pushq %rbp #
movq %rsp, %rbp #,
# a.c:8: foo();
movl $123, %edi #,
call foo #
movq (%rax), %rsi #,
leaq .LC0(%rip), %rdi #,
movl $0, %eax #,
call printf #,
movl $0, %eax
# a.c:9: }
popq %rbp #
ret
.size main, .-main
.ident "GCC: (Debian 8.3.0-6) 8.3.0"
.section .note.GNU-stack,"",@progbits
Here the assmebly is returning an address of local variable leaq -8(%rbp), %rax
, but then it calls instrution leave
, which should "invalidate" the address -8(%rbp)
(the stack pointer is added, so the I should be no longer be able to dereference that address, since the program moved on). So why it compile, and happily dereference the mov (%rax), %rdi
, when the address retunred to %rax
is no longer valid? Should not it segfault or terminate?
4
Answers
It will compile of course and some compilers will emit the diagnostic message informing you about the problem. Many compilers allows to treat such a messages (typically called warning) as errors by passing command line options.
UB means that behaviour of your program when you run it is undefined
As you stated, if you return the address of a local variable from a function and attempt to dereference (or even read) that address, you invoke undefined behavior.
The formal definition of undefined behavior is stated in section 3.4.3 of the C standard:
When undefined behavior occurs, the compiler makes no guarantees about what will happen. The program may crash, it may output strange results, or it may appear to work properly.
Generally speaking, compilers will assume code does not contain undefined behavior and work under that assumption. So when it does, all bets are off.
Just because the program could crash doesn’t mean it will.
No, but if it were, how could you tell? You seem to have a misunderstanding of undefined behavior. It does not mean "the compiler must reject it", "the compiler must warn about it", "the program must terminate", or any such thing. Those indeed may be a manifestations of UB, but if the language specification required such behavior then it wouldn’t be undefined. Ensuring that a C program does not exercise undefined behavior is the responsibility of the programmer, not the C implementation. Where a programmer does not fulfill that responsibility, the C implementation explicitly has no reciprocal responsibility — it can do anything within its capabilities.
Moreover, there is no single "the" C compiler. Different compilers may do things differently and still conform to the C language specifications. This is where implementation-defined, unspecified, and undefined behaviors come in. Allowing such variance is intentional on the part of the C language designers. Among other things, it allows implementations to operate in ways that are natural for their particular target hardware and execution environments.
Now let’s go back to "no". Here is a prototypical example of a function returning the address of an automatic variable:
What about that is supposed to have undefined behavior? It is well defined for the function to compute the address of
bar
, and the resulting pointer value has the correct type to be returned by the function. Afterbar
‘s lifetime ends when the function returns, the return value becomes indeterminate (paragraph 6.2.4/2 of the standard), but that does not in itself give rise to any undefined behavior.Or consider a caller:
As already discussed, our particular
foo()
‘s return value will always be indeterminate, so in particular, it might be a trap representation. But that’s a runtime consideration, not a compile-time one. And even if the value were a trap representation, C does not require that the implementation refuse or fail to store it. In particular, footnote 50 to C11 is explicit on this point:Note also that
foo()
andtest1()
can be compiled by different runs of the compiler, such that when compilingtest1()
, the compiler knows nothing about the behavior offoo()
beyond what is indicated by its prototype. C does not place translation-time requirements on implementations that depend on the runtime behavior of programs.On the other hand, the requirements around trap representations would apply differently if the function were modified slightly:
If the return value of
foo()
turns out to be a trap representation, then storing it inbar_ptr
(as opposed to initializingbar_ptr
with it) produces undefined behavior at runtime. Again, however, "undefined" means just what it says on the tin. C does not define any particular behavior for implementations to exhibit under the circumstances, and in particular, it does not require that programs terminate or manifest any externally-visible behavior at all. And again, that’s a runtime consideration, not a compile-time one.Furthermore, if
foo()
‘s return value turns out not to be a trap representation (being instead a pointer value that is not the address of any live object), then there’s nothing wrong with reading that value itself, either:The biggest and most commonly-exercised undefined behavior in this area would be that of trying to dereference the return value of
foo()
, which, trap representation or not, almost surely does not point to a liveint
object:But again, that’s a runtime consideration, not a compile-time one. And again, undefined means undefined. The C implementation should be expected to translate that successfully as long as there are in-scope declarations for the functions involved, and although some compilers might warn, they have no obligation to do so. The runtime behavior of function
test4
is undefined, but that does not mean the program necessarily will segfault or terminate in some other manner. It might, but I expect that in practice, the undefined behavior manifested by a great many implementations would be to print "foo() returned a pointer to an int with value 0". Doing so is in no way inconsistent with C’s requirements.The difficulty is that the Standard strongly implies(*) that the presence of code which would invoke Undefined Behavior if executed should not interfere with the execution of the program in cases where that code would not be executed. When the compiler generates code for the function, it has no idea if code that calls the function might attempt to treat the return value as an address in some fashion that would not be defined either by the Standard or by any extended semantics the implementation might offer. For example, many implementations guarantee that if conversion from a pointer to a
uintptr_t
within the lifetime of its target yields a certain value, conversion of that pointer touintptr_t
will always yield that value, without regard for whether its target still exists. Commercial compilers often abide by the philosophy that if it’s remotely conceivable that a programmer might want to do something (such as converting the address of a pointer touintptr_t
and logging it, to allow comparison with other pointer values that were logged earlier in program execution), and there’s nothing to be gained by not allowing it, the compiler may as well allow it.(*) Under the One Program Rule, a compiler that can properly process at least one program that exercises the translation limits given in the Standard may do anything it likes when fed any other source text. Thus, if a compiler writer thought it more useful to reject all programs meeting some criteria, despite some such programs being Strictly Conforming, than to process such programs, such behavior would not make a compiler non-conforming. Nonetheless, the Standard elsewhere says that a program would invoke UB when given some inputs could be a correct program with fully defined behavior when given other inputs.