Debian - Does GCC optimize out empty method calls?

JayDee
October 13, 2024
162 views
0 votes
2 Answers

The following method functions fine but I am curious as to performance and optimization.

Example I call ‘Output(str)’ to send something to the terminal. I added a method that only outputs if debugging, as follows. Now I am left wondering if this will produce pointless empty method calls when NOT debugging, and should I have gone about it differently.

bool OutputNewline = true;
void Output(std::string str, bool newLine = true)
{
    if (!OutputNewline)
    {
        putchar('n');
        OutputNewline = true;
    }
    if (str.length() == 0) { return; }
    fputs(str.c_str(), stdout);
    fflush(stdout);
    OutputNewline = (str[str.length()-1] == 'n');
}
void OutputDebug(std::string str, bool newLine = true)
{
    #if defined(DEBUG) && DEBUG == true
    Output(str, newLine);
    #endif
}

what would be the most appropriated way to view/check the compiled output file after compilation to see what methods/fields actually ended up in the compiled output.

g++ 12.2.0
Debian GNU/Linux 12 (bookworm)

Tags: c#gcc optimization

Answers

- J233r244meRichard
- October 12, 2024 at 6:42 pm
- 0 votes
0
When DEBUG is not defined, the call to Output in OutputDebug is removed at preprocessing time (before compile-time).

If optimizations are enabled (e.g. -O2 or -O3), OutputDebug and Output calls can be inlined by GCC. In this case there will be no generated code (i.e. no function call). This is likely to be the case, but only possible if the function calling them is in the same translation unit. Otherwise, link-time optimizations are required to do such optimisation (i.e. inlining across translation units).

Both the Output and OutputDebug functions will be generated. This can be seen on Godbolt even if DEBUG is not defined and we use the -O3 flag. The former can be automatically removed at compile time with the static keyword (even static inline for more aggressive optimisations). The same thing can be done for the later assuming the functions calling it are in the same translation unit.

Please note the linker can automatically remove functions that are not called in the resulting binaries thanks to link-time garbage collection (and function section marked by GCC).

For more information about all of this, please see CppCon 2018: The Bits Between the Bits: How We Get to main() of Matt Godbolt (yes him again).

Login or Signup to reply.

Now I am left wondering if this will produce pointless empty method calls when NOT debugging
…

what would be the most appropriated way to view/check the compiled output file after compilation to see what methods/fields actually ended up in the compiled output.

TL;DR

To check what symbols have definitions in the compiled output your best friend on Linux is readelf¹.
Yes, at optimisation level -O1 or higher the GCC c++ compiler can eliminate calls to the empty function OutputDebug when DEBUG is undefined –
as long as you stop needlessly thwarting this optimisation as your present code does.

Suppose we add the necessary #includes to your code to make it compile:

$ cat file.cpp
#include <string>
#include <cstdio>

bool OutputNewline = true;
void Output(std::string str, bool newLine = true)
{
    if (!OutputNewline)
    {
        putchar('n');
        OutputNewline = true;
    }
    if (str.length() == 0) { return; }
    fputs(str.c_str(), stdout);
    fflush(stdout);
    OutputNewline = (str[str.length()-1] == 'n');
}
void OutputDebug(std::string str, bool newLine = true)
{
    #if defined(DEBUG) && DEBUG == true
    Output(str, newLine);
    #endif
}

Compile it without defining DEBUG:

$ g++ -c file.cpp

and check the symbol table:

$ readelf --demangle --wide --syms file.o

Symbol table '.symtab' contains 16 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     1 OBJECT  LOCAL  DEFAULT    5 std::__detail::__integer_to_chars_is_unsigned<unsigned int>
     4: 0000000000000001     1 OBJECT  LOCAL  DEFAULT    5 std::__detail::__integer_to_chars_is_unsigned<unsigned long>
     5: 0000000000000002     1 OBJECT  LOCAL  DEFAULT    5 std::__detail::__integer_to_chars_is_unsigned<unsigned long long>
     6: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    3 OutputNewline
     7: 0000000000000000   174 FUNC    GLOBAL DEFAULT    1 Output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
     8: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND putchar
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::length() const
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND stdout
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::c_str() const
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fputs
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fflush
    14: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator[](unsigned long)
    15: 00000000000000ae    20 FUNC    GLOBAL DEFAULT    1 OutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)

You see that OutputDebug is present, despite having an empty body. FUNC means it is a function and GLOBAL means it is available to the linker. This is with default optimisation (i.e. -O0, = none). Maximum optimisation:

$ g++ -c -O3  file.cpp

does not change that:

$ readelf --demangle --wide --syms file.o

Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000   106 FUNC    GLOBAL DEFAULT    1 Output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
     4: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    3 OutputNewline
     5: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND stdout
     6: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fputs
     7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fflush
     8: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND putc
     9: 0000000000000070     5 FUNC    GLOBAL DEFAULT    1 OutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)

It cannot change that, because OutputDebug is declared extern by default, which means the compiler is compelled to make the definition visible to
the linker, no matter what the definition is.

Presumably you’d need both of the functions Output and OutputDebug to be available to the linker or else neither of them. If neither, then you might
explicitly declare them static:

$ cat file1.cpp
#include <string>
#include <cstdio>

bool OutputNewline = true;
static void Output(std::string str, bool newLine = true)
{
    if (!OutputNewline)
    {
        putchar('n');
        OutputNewline = true;
    }
    if (str.length() == 0) { return; }
    fputs(str.c_str(), stdout);
    fflush(stdout);
    OutputNewline = (str[str.length()-1] == 'n');
}
static void OutputDebug(std::string str, bool newLine = true)
{
    #if defined(DEBUG) && DEBUG == true
    Output(str, newLine);
    #endif
}

After that is compiled:

$ g++ -c file1.cpp

you will see that:

$ readelf --demangle --wide --syms file1.o | egrep (Symbol|Ndx|Output)
Symbol table '.symtab' contains 16 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     3: 0000000000000000   174 FUNC    LOCAL  DEFAULT    1 Output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
     4: 00000000000000ae    20 FUNC    LOCAL  DEFAULT    1 OutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
     8: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    3 OutputNewline

definitions of both Output and OutputDebug are still in the object file, but now they are LOCAL, not GLOBAL, functions. They are not available to the linker.

That means the compiler is at liberty to optimise them away, if possible. With minimal optimisation:

$ g++ -c -O1 file1.cpp

they both disappear:

$ readelf --demangle --wide --syms file1.o

Symbol table '.symtab' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file1.cpp
     2: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    2 OutputNewline

Since the linker can’t see them, they can’t be called externally. And since they are not called within file.o either, they are dead code.

This remains true even if we define DEBUG:

$ g++ -c -O1 -DDEBUG=1 file1.cpp 
$ readelf --demangle --wide --syms file1.o

Symbol table '.symtab' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file1.cpp
     2: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    2 OutputNewline

Because even though OutputDebug now calls Output, the first is dead code, and nothing else calls the second, so it is dead code as well.

OutputDebug cannot be optimised away by the compiler if it is extern, whether or not DEBUG is defined. If it static then
it will be optimised away, with -O1 or higher, whether or not DEBUG is defined, unless it is ultimately called by some function that is extern.

The one interesting question then is:

If DEBUG is undefined can the compiler optimise away an empty call to OutputDebug
by an extern function? As in:

$ cat file2.cpp
#include <string>
#include <cstdio>

bool OutputNewline = true;
static void Output(std::string str, bool newLine = true)
{
    if (!OutputNewline)
    {
        putchar('n');
        OutputNewline = true;
    }
    if (str.length() == 0) { return; }
    fputs(str.c_str(), stdout);
    fflush(stdout);
    OutputNewline = (str[str.length()-1] == 'n');
}
static void OutputDebug(std::string str, bool newLine = true)
{
    #if defined(DEBUG) && DEBUG == true
    Output(str, newLine);
    #endif
}

void CallOutputDebug(std::string const & str, bool newLine = true)
{
    OutputDebug(str,newLine);
}

Compiled with no optimisation:

$ g++ -c file2.cpp

$ readelf --demangle --wide --syms file2.o | egrep (Symbol|Ndx|Output)
Symbol table '.symtab' contains 20 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     3: 0000000000000000   174 FUNC    LOCAL  DEFAULT    1 Output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
     4: 00000000000000ae    20 FUNC    LOCAL  DEFAULT    1 OutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
     8: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    3 OutputNewline
    16: 00000000000000c2   113 FUNC    GLOBAL DEFAULT    1 CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)

Output and OutputDebug reappear as LOCAL symbols and CallOutputDebug appears as GLOBAL. Compiled with optimisation:

$ g++ -c -O1 file2.cpp
$ readelf --demangle --wide --syms file2.o | egrep (Symbol|Ndx|Output)
Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     3: 0000000000000000   211 FUNC    GLOBAL DEFAULT    1 CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
     8: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    3 OutputNewline

Output and OutputDebug disappear altogether. Therefore we know that neither of these symbols is called in the object file, because they don’t exist. If you don’t yet
trust readelf on this score, we can repeat the compilation and save the assembly code:

$ g++ -c -O1 file2.cpp --save-temps

The assembly will be saved in file2.s. Then grep for the 3 function symbols in the assembly:

$ cat file2.s | c++filt | egrep (CallOutputDebug|OutputDebug|Output) 
    .globl  CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
    .type   CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool), @function
CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool):
    .size   CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool), .-CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
    .globl  OutputNewline
    .type   OutputNewline, @object
    .size   OutputNewline, 1
OutputNewline:

The only one present is CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (c++filt is the C++ name demangler).

You can conclude then that the compiler has either eliminated the OutputDebug call or, at least, inlined whatever definition remains of OutputDebug into the definition of
CallOutputDebug. As before, Output is dead code so it is just gone.

But that is not exactly what you would like to know.

You would like to know that the compiler can eliminate, not merely inline, the call to OutputDebug
in CallOutputDebug, when DEBUG is undefined. I.e. you would like to see that optimised code generated from file2.cpp, with DEBUG undefined, can be the same as that generated from:

$ cat file3.cpp 
#include <string>

bool OutputNewline = true;

void CallOutputDebug(std::string const & str, bool newLine = true)
{
}

If we compile that file:

$ g++ -c -O1 file3.cpp

then readelf can answer the question. Compare this:

$ readelf --demangle --syms --wide file2.o

Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file2.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000   211 FUNC    GLOBAL DEFAULT    1 CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
     4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND operator delete(void*, unsigned long)
     5: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
     6: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND memcpy
     7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __stack_chk_fail
     8: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    3 OutputNewline

with this:

$ readelf --demangle --syms --wide file3.o

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file3.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     5 FUNC    GLOBAL DEFAULT    1 CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
     4: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    2 OutputNewline

And observe:

In file2.o the size of the function CallOutputDebug is 0x211 bytes, whereas in file3.o it is merely 0x5 bytes.
In file2.o there are references to 5 external symbols that are not referenced in file3.o:
- CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
- std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
- operator delete(void*, unsigned long)
- memcpy
- __stack_chk_fail

So the two object files are not equivalent, and the definition of CallOutputDebug in file2.o contains more than 100 times as much
code as its definition is file3.o. That is a resounding No for the elimination of the OutputDebug call in file2.o

But Whoa!…

Notice the incongruence between the signatures of:

void CallOutputDebug(std::string const & str, bool newLine = true)

as in file2.cpp, file3.cpp, and:

void OutputDebug(std::string str, bool newLine = true)

as in file2.cpp. My CallOutputDebug accepts a std::string const & argument
whereas your OutputDebug accepts a std::string.

That means that OutputDebug must copy-construct a copy of its str argument, while CallOutputDebug merely passes
a const reference to its str, no copy-construction required. In both cases no copy-construction is required to pass newLine,
which is of primitive type.

And that means that that OutputDebug, as you have defined it, is not an empty function, even when DEBUG is
undefined. It copy-constructs an std::string, and that copy-construction is inlined into the compiled body of
CallOutputDebug as defined in file2.cpp:

void CallOutputDebug(std::string const & str, bool newLine = true)
{
    OutputDebug(str,newLine);
}

We’ve already got the assembly generated from file2.cpp. It’s in file2.s. Here’s one relevant snippet
from all that unaccounted-for extra code that we detected there:

...
.L9:
    .cfi_restore_state
    leaq    8(%rsp), %rsi
    leaq    16(%rsp), %rdi
    movl    $0, %edx
    call    _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_createERmm@PLT
    movq    %rax, %rdi
    movq    %rax, 16(%rsp)
    movq    8(%rsp), %rax
    movq    %rax, 32(%rsp)
.L3:
    movq    %rbx, %rdx
    movq    %rbp, %rsi
    call    memcpy@PLT
    jmp .L5
...

Note the demangled name:

$ echo _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_createERmm | c++filt
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)

You don’t need to understand all of the assembly to recognise the snippet is the execution of an std::string copy-constructor.

You didn’t need to have this stuff done. It was just careless to declare:

void Output(std::string str, bool newLine = true)
void OutputDebug(std::string str, bool newLine = true)

rather than:

void Output(std::string const & str, bool newLine = true)
void OutputDebug(std::string str const &, bool newLine = true)

But the compiler doesn’t know you were careless and can’t optimise on that basis. After preprocessing
of file2.cpp It finds calls to those 5 external functions we noted earlier, including the:

std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)

memcpy

in the snippet. It doesn’t know the definitions of those external functions and for all it can possibly tell, at any optimisation level, they are all stuff that you want done, so it assembles calls to them all, and that accounts for all the surplus code in file2.o v. file3.o.

Once again, lesson learned.

Let’s fix file2.cpp to avoid needless copy-construction of std::strings, as:

$ cat file4.cpp
#include <string>
#include <cstdio>

bool OutputNewline = true;
static void Output(std::string const & str, bool newLine = true)
{
    if (!OutputNewline)
    {
        putchar('n');
        OutputNewline = true;
    }
    if (str.length() == 0) { return; }
    fputs(str.c_str(), stdout);
    fflush(stdout);
    OutputNewline = (str[str.length()-1] == 'n');
}
static void OutputDebug(std::string const & str, bool newLine = true)
{
    #if defined(DEBUG) && DEBUG == true
    Output(str, newLine);
    #endif
}

void CallOutputDebug(std::string const & str, bool newLine = true)
{
    OutputDebug(str,newLine);
}

Compile that:

$ g++ -c -O1 file4.cpp

Then repeat the readelf comparison with file4.o and file3.o

$ readelf --demangle --syms --wide file4.o

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file4.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     5 FUNC    GLOBAL DEFAULT    1 CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
     4: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    2 OutputNewline

$ readelf --demangle --syms --wide file3.o

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file3.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     5 FUNC    GLOBAL DEFAULT    1 CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
     4: 0000000000000000     1 OBJECT  GLOBAL DEFAULT    2 OutputNewline

Now the symbol-tables are identical, except of course that the filename values of the FILE entry are different, and the definition of:

CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)

is 5 bytes in size in both object files. That strongly suggests that the OutputDebug call has been erased from file2.o, this time, in
the way you hoped it would. And if we compare the respective object files we have proof of that:

$ cmp -l file3.o file4.o
 318  63  64

There is only one difference at byte 318 where file3.o has character (octal) 63 = ‘3’ and file4.o has character
(octal) 64 = ‘4’. That is the difference between the values of the FILE entry: file3.cpp and file4.cpp. The
object code is the same. You can generate the assembly listing file4.s if you are interested and compare it with
file3.s.

1 GNU binutils objdump is a fairly serviceable second-best to readelf and will be available for non-ELF object formats
to which binutils has been ported.

Please signup or login to give your own answer.

Click here to cancel reply.

Debian – Does GCC optimize out empty method calls?

Answers