skip to Main Content

I saw this meme on Instagram about some C++ code that should not output anything but does. The code is:

#include <iostream>

int main() {
    while (1)
        ;
}

void unreachable() {
    std::cout << "Hello World!" << std::endl;
}

c++ meme

I compiled it with clang as shown in the meme and got the same result (Ubuntu clang version 14.0.0-1ubuntu1.1) but the same code compiled with gcc does what you expect: nothing (g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0).

I would like to know why clang does things different and how the hack the unreachable function is executed if I never call it from the main function.

2

Answers


  1. In

    #include <iostream>
    
    int main() {
        while (1)
            ;
    }
    
    void unreachable() {
        std::cout << "Hello World!" << std::endl;
    }
    

    The never-ending do-nothing loop breaks the rules. As per [intro.progress]

    The implementation may assume that any thread will eventually do one of the following:

    • terminate,
    • make a call to a library I/O function,
    • perform an access through a volatile glvalue, or
    • perform a synchronization operation or an atomic operation.

    [Note 1: This is intended to allow compiler transformations such as removal of empty loops, even when termination cannot be proven. — end note]

    Since the program is invalid, the compiler is allowed to do anything it wants from generate a program that does what you expect, GCC’s response, to literally anything else. So technically clang’s result is valid. Be grateful. The compiler could have opted to produce Skynet.

    Login or Signup to reply.
  2. This is a well-known instance of (unusually?) aggressive optimization by Clang. You can find many lengthy discussions about it e.g. at https://github.com/llvm/llvm-project/issues/60622.

    The standard has certain forward-progress requirements on programs that the compiler is allowed to assume hold true.

    In particular, the compiler is allowed to assume that any thread eventually either terminates, calls a standard library IO function, performs a volatile access, a synchronization action or an atomic access.

    Your loop while (1); will cause the main thread to never do any of these things. Therefore the program has undefined behavior if this loop is reached in execution.

    Clang replaces the loop with an unreachable marker, as a valid program could never possibly reach the UB loop and as a result it will not emit any instructions for the body of main, not even ret, since it follows from the above that it is impossible that main is ever called if the program were valid.

    So the call to main will fall through to unreachable in the machine instructions.

    This behavior is permitted by the standard, with the exception that it will result in main and unreachable having the same function address, which in general could affect observable behavior in ways that it shouldn’t. If Clang would add a single ud2 to trap into the function body when doing such optimizations for functions that always have UB when called, it would be fully conforming. See e.g. https://github.com/llvm/llvm-project/issues/60596.

    Also note that C has a similar rule with an exception if the controlling condition of the loop is a constant expression, which here is the case. So in C this program does not have undefined behavior.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search