skip to Main Content

Even I get an warning a function returns an address from local variable, it compiles. Isn’t it then UB of compiler? The generated assembly:

    .text
.LC0:
    .asciz "%in"
    .globl  foo
    .type   foo, @function
foo:
    pushq   %rbp    #
    movq    %rsp, %rbp  #,
    sub     $16, %rsp   #,
    mov     %rdi, -8(%rbp)   #,
    leaq    -8(%rbp), %rax   #,
# a.c:5: }
    leave 
    ret
    .size   foo, .-foo
    .globl  main
    .type   main, @function
main:
    pushq   %rbp    #
    movq    %rsp, %rbp  #,
# a.c:8:    foo();
    movl    $123, %edi  #,
    call    foo #
    movq    (%rax), %rsi   #,
    leaq    .LC0(%rip), %rdi   #,
    movl    $0, %eax   #,
    call    printf #,
    movl $0, %eax
# a.c:9: }
    popq    %rbp    #
    ret 
    .size   main, .-main
    .ident  "GCC: (Debian 8.3.0-6) 8.3.0"
    .section    .note.GNU-stack,"",@progbits

Here the assmebly is returning an address of local variable leaq -8(%rbp), %rax, but then it calls instrution leave, which should "invalidate" the address -8(%rbp) (the stack pointer is added, so the I should be no longer be able to dereference that address, since the program moved on). So why it compile, and happily dereference the mov (%rax), %rdi, when the address retunred to %rax is no longer valid? Should not it segfault or terminate?

4

Answers


  1. It will compile of course and some compilers will emit the diagnostic message informing you about the problem. Many compilers allows to treat such a messages (typically called warning) as errors by passing command line options.

    UB means that behaviour of your program when you run it is undefined

    Login or Signup to reply.
  2. As you stated, if you return the address of a local variable from a function and attempt to dereference (or even read) that address, you invoke undefined behavior.

    The formal definition of undefined behavior is stated in section 3.4.3 of the C standard:

    behavior, upon use of a nonportable or erroneous program construct or
    of erroneous data,for which this International Standard imposes no
    requirements

    When undefined behavior occurs, the compiler makes no guarantees about what will happen. The program may crash, it may output strange results, or it may appear to work properly.

    Generally speaking, compilers will assume code does not contain undefined behavior and work under that assumption. So when it does, all bets are off.

    Just because the program could crash doesn’t mean it will.

    Login or Signup to reply.
  3. Even I get an warning a function returns an address from local
    variable, it compiles. Isn’t it then UB of compiler?

    No, but if it were, how could you tell? You seem to have a misunderstanding of undefined behavior. It does not mean "the compiler must reject it", "the compiler must warn about it", "the program must terminate", or any such thing. Those indeed may be a manifestations of UB, but if the language specification required such behavior then it wouldn’t be undefined. Ensuring that a C program does not exercise undefined behavior is the responsibility of the programmer, not the C implementation. Where a programmer does not fulfill that responsibility, the C implementation explicitly has no reciprocal responsibility — it can do anything within its capabilities.

    Moreover, there is no single "the" C compiler. Different compilers may do things differently and still conform to the C language specifications. This is where implementation-defined, unspecified, and undefined behaviors come in. Allowing such variance is intentional on the part of the C language designers. Among other things, it allows implementations to operate in ways that are natural for their particular target hardware and execution environments.

    Now let’s go back to "no". Here is a prototypical example of a function returning the address of an automatic variable:

    int *foo() {
        int bar = 0;
        return &bar;
    }
    

    What about that is supposed to have undefined behavior? It is well defined for the function to compute the address of bar, and the resulting pointer value has the correct type to be returned by the function. After bar‘s lifetime ends when the function returns, the return value becomes indeterminate (paragraph 6.2.4/2 of the standard), but that does not in itself give rise to any undefined behavior.

    Or consider a caller:

    void test1() {
        int *bar_ptr = foo();  // OK under all circumstances
    }
    

    As already discussed, our particular foo()‘s return value will always be indeterminate, so in particular, it might be a trap representation. But that’s a runtime consideration, not a compile-time one. And even if the value were a trap representation, C does not require that the implementation refuse or fail to store it. In particular, footnote 50 to C11 is explicit on this point:

    Thus, an automatic variable can be initialized to a trap
    representation without causing undefined behavior, but the value of
    the variable cannot be used until a proper value is stored in it.

    Note also that foo() and test1() can be compiled by different runs of the compiler, such that when compiling test1(), the compiler knows nothing about the behavior of foo() beyond what is indicated by its prototype. C does not place translation-time requirements on implementations that depend on the runtime behavior of programs.

    On the other hand, the requirements around trap representations would apply differently if the function were modified slightly:

    void test2() {
        int *bar_ptr = NULL;
        bar_ptr = foo();      // UB (only) if foo() returns a trap representation
    }
    

    If the return value of foo() turns out to be a trap representation, then storing it in bar_ptr (as opposed to initializing bar_ptr with it) produces undefined behavior at runtime. Again, however, "undefined" means just what it says on the tin. C does not define any particular behavior for implementations to exhibit under the circumstances, and in particular, it does not require that programs terminate or manifest any externally-visible behavior at all. And again, that’s a runtime consideration, not a compile-time one.

    Furthermore, if foo()‘s return value turns out not to be a trap representation (being instead a pointer value that is not the address of any live object), then there’s nothing wrong with reading that value itself, either:

    void test3() {
        int *bar_ptr = foo();
        // UB (only) if foo() returned a trap representation:
        printf("foo() returned %pn", (void *) bar_ptr);
    }
    

    The biggest and most commonly-exercised undefined behavior in this area would be that of trying to dereference the return value of foo(), which, trap representation or not, almost surely does not point to a live int object:

    void test4() {
        int *bar_ptr = foo();
        // UB under all circumstances for the given foo():
        printf("foo() returned a pointer to an int with value %dn", *bar_ptr);
    }
    

    But again, that’s a runtime consideration, not a compile-time one. And again, undefined means undefined. The C implementation should be expected to translate that successfully as long as there are in-scope declarations for the functions involved, and although some compilers might warn, they have no obligation to do so. The runtime behavior of function test4 is undefined, but that does not mean the program necessarily will segfault or terminate in some other manner. It might, but I expect that in practice, the undefined behavior manifested by a great many implementations would be to print "foo() returned a pointer to an int with value 0". Doing so is in no way inconsistent with C’s requirements.

    Login or Signup to reply.
  4. The difficulty is that the Standard strongly implies(*) that the presence of code which would invoke Undefined Behavior if executed should not interfere with the execution of the program in cases where that code would not be executed. When the compiler generates code for the function, it has no idea if code that calls the function might attempt to treat the return value as an address in some fashion that would not be defined either by the Standard or by any extended semantics the implementation might offer. For example, many implementations guarantee that if conversion from a pointer to a uintptr_t within the lifetime of its target yields a certain value, conversion of that pointer to uintptr_t will always yield that value, without regard for whether its target still exists. Commercial compilers often abide by the philosophy that if it’s remotely conceivable that a programmer might want to do something (such as converting the address of a pointer to uintptr_t and logging it, to allow comparison with other pointer values that were logged earlier in program execution), and there’s nothing to be gained by not allowing it, the compiler may as well allow it.

    (*) Under the One Program Rule, a compiler that can properly process at least one program that exercises the translation limits given in the Standard may do anything it likes when fed any other source text. Thus, if a compiler writer thought it more useful to reject all programs meeting some criteria, despite some such programs being Strictly Conforming, than to process such programs, such behavior would not make a compiler non-conforming. Nonetheless, the Standard elsewhere says that a program would invoke UB when given some inputs could be a correct program with fully defined behavior when given other inputs.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search