Scenario:
$ cat lib.c
#include <stdio.h>
#define STR_(x) #x
#define STR(x) STR_(x)
#define CAT_(x,y) x##y
#define CAT(x,y) CAT_(x,y)
__attribute__((constructor))
void CAT(foo,)(void) { printf("foo" STR(N) " %pn", CAT(foo,)); }
void CAT(bar,N)(void){ puts("bar" STR(N)); }
$ cat main.c
void barx(void);
void bary(void);
void barz(void);
int main(void)
{
barx();
bary();
barz();
}
$ cat build_run.sh
gcc lib.c -DN=x -c -fPIC -o libx.o && gcc libx.o -o libx.so -shared &&
gcc lib.c -DN=y -c -fPIC -o liby.o && gcc liby.o -o liby.so -shared &&
gcc lib.c -DN=z -c -fPIC -o libz.o && gcc libz.o -o libz.so -shared &&
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH &&
gcc -L. -o main main.c -lx -ly -lz &&
./main
$ bash build_run.sh
foox 0x7f0bf002e139
foox 0x7f0bf002e139
foox 0x7f0bf002e139
barx
bary
barz
Here we see that:
- All
.so
libraries haveconstructor
attributed function with the same namefoo
. - Function
foo
is called 3 times from library X (which may be unexpected behavior) instead of 1 time from libraries X, Y, Z (which may be expected behavior).
As I understand, addresses of constructor
attributed functions foo
are placed (directly or indirectly) in .init_array
section. Hence, function names are expected to be irrelevant.
The core question: why function foo
is called 3 times from library X instead of 1 time from libraries X, Y, Z?
Extra observations:
- If we change in
lib.c
fromCAT(foo,)
toCAT(foo,N)
and rerun thebuild_run.sh
, then we will see:
$ bash build_run.sh
fooz 0x7fc121dcc139
fooy 0x7fc121dd1139
foox 0x7fc121dd6139
barx
bary
barz
which is expected behavior.
- Running the original (i.e. with
CAT(foo,)
) example on Cygwin leads to functionfoo
is called 1 time from libraries X, Y, Z (which may be expected behavior).
System and software info:
$ uname -a
Linux xxx 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
2
Answers
I think it is an example of symbol interposition.
When loading shared libraries, the dynamic linker (Linux) resolves symbols (function names, variables) globally by default. If two or more shared libraries define the same symbol (e.g.,
foo
), the first-loaded version of the symbol is used for all subsequent references.Potential fixes:
-Bsymbolic
linker option when compiling each shared library but I never tested it myself.static
should also be effectiveCygwin uses DLL libraries instead and Windows has a different mechanism.In Windows, each DLL has a distinct address space for its functions and variables, meaning symbol names are resolved independently within each DLL rather than globally across all loaded libraries.
No. They are still relevant. When dynamic library is loaded, OS is assigning each function an actual address. If two libraries has the same name, the function from the first loaded library is used, and function from second loaded library is ignored.
This leads to a behavior you observed. The first library loads, registers
foo()
, runs it. The second library on load sees thatfoo()
is already registered, in the space of this process, and runs it.Since you intent for the
foo
to be a constructor – then simply addingstatic
to thefoo
definition will solve the issue. In this case, the function will not be visible outside the dynamic library and would, therefore be exempt from the name duplication between two libraries.Of course, you will loose ability to call
foo()
from outside thelib.c
. If it still be needed – then you must have different names for theconstructor
functions (which you do by usingCAT(foo,N)
).