I saw this meme on Instagram about some C++ code that should not output anything but does. The code is:
#include <iostream>
int main() {
while (1)
;
}
void unreachable() {
std::cout << "Hello World!" << std::endl;
}
I compiled it with clang as shown in the meme and got the same result (Ubuntu clang version 14.0.0-1ubuntu1.1
) but the same code compiled with gcc does what you expect: nothing (g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
).
I would like to know why clang does things different and how the hack the unreachable function is executed if I never call it from the main function.
2
Answers
In
The never-ending do-nothing loop breaks the rules. As per [intro.progress]
Since the program is invalid, the compiler is allowed to do anything it wants from generate a program that does what you expect, GCC’s response, to literally anything else. So technically clang’s result is valid. Be grateful. The compiler could have opted to produce Skynet.
This is a well-known instance of (unusually?) aggressive optimization by Clang. You can find many lengthy discussions about it e.g. at https://github.com/llvm/llvm-project/issues/60622.
The standard has certain forward-progress requirements on programs that the compiler is allowed to assume hold true.
In particular, the compiler is allowed to assume that any thread eventually either terminates, calls a standard library IO function, performs a
volatile
access, a synchronization action or an atomic access.Your loop
while (1);
will cause the main thread to never do any of these things. Therefore the program has undefined behavior if this loop is reached in execution.Clang replaces the loop with an unreachable marker, as a valid program could never possibly reach the UB loop and as a result it will not emit any instructions for the body of
main
, not evenret
, since it follows from the above that it is impossible thatmain
is ever called if the program were valid.So the call to
main
will fall through tounreachable
in the machine instructions.This behavior is permitted by the standard, with the exception that it will result in
main
andunreachable
having the same function address, which in general could affect observable behavior in ways that it shouldn’t. If Clang would add a singleud2
to trap into the function body when doing such optimizations for functions that always have UB when called, it would be fully conforming. See e.g. https://github.com/llvm/llvm-project/issues/60596.Also note that C has a similar rule with an exception if the controlling condition of the loop is a constant expression, which here is the case. So in C this program does not have undefined behavior.