The following method functions fine but I am curious as to performance and optimization.
Example I call ‘Output(str)’ to send something to the terminal. I added a method that only outputs if debugging, as follows. Now I am left wondering if this will produce pointless empty method calls when NOT debugging, and should I have gone about it differently.
bool OutputNewline = true;
void Output(std::string str, bool newLine = true)
{
if (!OutputNewline)
{
putchar('n');
OutputNewline = true;
}
if (str.length() == 0) { return; }
fputs(str.c_str(), stdout);
fflush(stdout);
OutputNewline = (str[str.length()-1] == 'n');
}
void OutputDebug(std::string str, bool newLine = true)
{
#if defined(DEBUG) && DEBUG == true
Output(str, newLine);
#endif
}
what would be the most appropriated way to view/check the compiled output file after compilation to see what methods/fields actually ended up in the compiled output.
g++ 12.2.0
Debian GNU/Linux 12 (bookworm)
2
Answers
When
DEBUG
is not defined, the call toOutput
inOutputDebug
is removed at preprocessing time (before compile-time).If optimizations are enabled (e.g.
-O2
or-O3
),OutputDebug
andOutput
calls can be inlined by GCC. In this case there will be no generated code (i.e. no function call). This is likely to be the case, but only possible if the function calling them is in the same translation unit. Otherwise, link-time optimizations are required to do such optimisation (i.e. inlining across translation units).Both the
Output
andOutputDebug
functions will be generated. This can be seen on Godbolt even ifDEBUG
is not defined and we use the-O3
flag. The former can be automatically removed at compile time with thestatic
keyword (evenstatic inline
for more aggressive optimisations). The same thing can be done for the later assuming the functions calling it are in the same translation unit.Please note the linker can automatically remove functions that are not called in the resulting binaries thanks to link-time garbage collection (and function section marked by GCC).
For more information about all of this, please see CppCon 2018: The Bits Between the Bits: How We Get to main() of Matt Godbolt (yes him again).
TL;DR
readelf
1.-O1
or higher the GCC c++ compiler can eliminate calls to the empty functionOutputDebug
whenDEBUG
is undefined –as long as you stop needlessly thwarting this optimisation as your present code does.
Suppose we add the necessary
#include
s to your code to make it compile:Compile it without defining
DEBUG
:and check the symbol table:
You see that
OutputDebug
is present, despite having an empty body.FUNC
means it is a function andGLOBAL
means it is available to the linker. This is with default optimisation (i.e.-O0
, = none). Maximum optimisation:does not change that:
It cannot change that, because
OutputDebug
is declaredextern
by default, which means the compiler is compelled to make the definition visible tothe linker, no matter what the definition is.
Presumably you’d need both of the functions
Output
andOutputDebug
to be available to the linker or else neither of them. If neither, then you mightexplicitly declare them
static
:After that is compiled:
you will see that:
definitions of both
Output
andOutputDebug
are still in the object file, but now they areLOCAL
, notGLOBAL
, functions. They are not available to the linker.That means the compiler is at liberty to optimise them away, if possible. With minimal optimisation:
they both disappear:
Since the linker can’t see them, they can’t be called externally. And since they are not called within
file.o
either, they are dead code.This remains true even if we define
DEBUG
:Because even though
OutputDebug
now callsOutput
, the first is dead code, and nothing else calls the second, so it is dead code as well.OutputDebug
cannot be optimised away by the compiler if it isextern
, whether or notDEBUG
is defined. If itstatic
thenit will be optimised away, with
-O1
or higher, whether or notDEBUG
is defined, unless it is ultimately called by some function that isextern
.The one interesting question then is:
If
DEBUG
is undefined can the compiler optimise away an empty call toOutputDebug
by an
extern
function? As in:Compiled with no optimisation:
Output
andOutputDebug
reappear asLOCAL
symbols andCallOutputDebug
appears asGLOBAL
. Compiled with optimisation:Output
andOutputDebug
disappear altogether. Therefore we know that neither of these symbols is called in the object file, because they don’t exist. If you don’t yettrust
readelf
on this score, we can repeat the compilation and save the assembly code:The assembly will be saved in
file2.s
. Then grep for the 3 function symbols in the assembly:The only one present is
CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
(c++filt
is the C++ name demangler).You can conclude then that the compiler has either eliminated the
OutputDebug
call or, at least, inlined whatever definition remains ofOutputDebug
into the definition ofCallOutputDebug
. As before,Output
is dead code so it is just gone.But that is not exactly what you would like to know.
You would like to know that the compiler can eliminate, not merely inline, the call to
OutputDebug
in
CallOutputDebug
, whenDEBUG
is undefined. I.e. you would like to see that optimised code generated fromfile2.cpp
, withDEBUG
undefined, can be the same as that generated from:If we compile that file:
then
readelf
can answer the question. Compare this:with this:
And observe:
file2.o
the size of the functionCallOutputDebug
is 0x211 bytes, whereas infile3.o
it is merely 0x5 bytes.file2.o
there are references to 5 external symbols that are not referenced infile3.o
:CallOutputDebug(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
operator delete(void*, unsigned long)
memcpy
__stack_chk_fail
So the two object files are not equivalent, and the definition of
CallOutputDebug
infile2.o
contains more than 100 times as muchcode as its definition is
file3.o
. That is a resounding No for the elimination of theOutputDebug
call infile2.o
But Whoa!…
Notice the incongruence between the signatures of:
as in
file2.cpp
,file3.cpp
, and:as in
file2.cpp
. MyCallOutputDebug
accepts astd::string const &
argumentwhereas your
OutputDebug
accepts astd::string
.That means that
OutputDebug
must copy-construct a copy of itsstr
argument, whileCallOutputDebug
merely passesa const reference to its
str
, no copy-construction required. In both cases no copy-construction is required to passnewLine
,which is of primitive type.
And that means that that
OutputDebug
, as you have defined it, is not an empty function, even whenDEBUG
isundefined. It copy-constructs an
std::string
, and that copy-construction is inlined into the compiled body ofCallOutputDebug
as defined infile2.cpp
:We’ve already got the assembly generated from
file2.cpp
. It’s infile2.s
. Here’s one relevant snippetfrom all that unaccounted-for extra code that we detected there:
Note the demangled name:
You don’t need to understand all of the assembly to recognise the snippet is the execution of an
std::string
copy-constructor.You didn’t need to have this stuff done. It was just careless to declare:
rather than:
But the compiler doesn’t know you were careless and can’t optimise on that basis. After preprocessing
of
file2.cpp
It finds calls to those 5 external functions we noted earlier, including the:in the snippet. It doesn’t know the definitions of those external functions and for all it can possibly tell, at any optimisation level, they are all stuff that you want done, so it assembles calls to them all, and that accounts for all the surplus code in
file2.o
v.file3.o
.Once again, lesson learned.
Let’s fix
file2.cpp
to avoid needless copy-construction ofstd::string
s, as:Compile that:
Then repeat the
readelf
comparison withfile4.o
andfile3.o
Now the symbol-tables are identical, except of course that the filename values of the
FILE
entry are different, and the definition of:is 5 bytes in size in both object files. That strongly suggests that the
OutputDebug
call has been erased fromfile2.o
, this time, inthe way you hoped it would. And if we compare the respective object files we have proof of that:
There is only one difference at byte 318 where
file3.o
has character (octal) 63 = ‘3’ andfile4.o
has character(octal) 64 = ‘4’. That is the difference between the values of the
FILE
entry:file3.cpp
andfile4.cpp
. Theobject code is the same. You can generate the assembly listing
file4.s
if you are interested and compare it withfile3.s
.1 GNU binutils
objdump
is a fairly serviceable second-best toreadelf
and will be available for non-ELF object formatsto which binutils has been ported.