I have the following C snippets that both obviously causes stack overflow error:
a.c
int f(int i) {
f(i);
}
int main() {
f(1);
}
b.c
int f(int i) {
f(i+1);
}
int main() {
f(1);
}
After running both and looking at the result produced in coredumpsctl list
the output sizes are very different:
Tue 2024-02-20 15:38:28 +0330 420696 1000 1000 SIGSEGV present /tmp/a 204.2K
Tue 2024-02-20 15:38:30 +0330 420710 1000 1000 SIGSEGV present /tmp/b 899.7K
The second program’s (b.c
) core dump size is more than 4 times of the first one. It was very strange to me as two programs don’t have any noticeable difference. Can someone explain this behavior?
Edit
I used this command to compile both files:
$ gcc a.c -o a && gcc b.c -o b
The gcc
version I used:
$ gcc --version
gcc (Debian 12.2.0-14) 12.2.0
Also assembly generated for a.c
(using objdump -S
):
0000000000001129 <f>:
1129: 55 push %rbp
112a: 48 89 e5 mov %rsp,%rbp
112d: 48 83 ec 10 sub $0x10,%rsp
1131: 89 7d fc mov %edi,-0x4(%rbp)
1134: 8b 45 fc mov -0x4(%rbp),%eax
1137: 89 c7 mov %eax,%edi
1139: e8 eb ff ff ff call 1129 <f>
113e: 90 nop
113f: c9 leave
1140: c3 ret
0000000000001141 <main>:
1141: 55 push %rbp
1142: 48 89 e5 mov %rsp,%rbp
1145: bf 01 00 00 00 mov $0x1,%edi
114a: e8 da ff ff ff call 1129 <f>
114f: b8 00 00 00 00 mov $0x0,%eax
1154: 5d pop %rbp
1155: c3 ret
And for b.c
:
0000000000001129 <f>:
1129: 55 push %rbp
112a: 48 89 e5 mov %rsp,%rbp
112d: 48 83 ec 10 sub $0x10,%rsp
1131: 89 7d fc mov %edi,-0x4(%rbp)
1134: 8b 45 fc mov -0x4(%rbp),%eax
1137: 83 c0 01 add $0x1,%eax
113a: 89 c7 mov %eax,%edi
113c: e8 e8 ff ff ff call 1129 <f>
1141: 90 nop
1142: c9 leave
1143: c3 ret
0000000000001144 <main>:
1144: 55 push %rbp
1145: 48 89 e5 mov %rsp,%rbp
1148: bf 01 00 00 00 mov $0x1,%edi
114d: e8 d7 ff ff ff call 1129 <f>
1152: b8 00 00 00 00 mov $0x0,%eax
1157: 5d pop %rbp
1158: c3 ret
2
Answers
The difference in core dump size is due to compiler optimizations. The issue is because of differences in how the compiler optimizes the recursive function calls in the two programs.
In
a.c
each recursive call is same as the previous one, leading to atail-recursive
pattern and many compilers are able to optimizetail-recursive
functions by transforming them into iterative loops and it results in a more efficient stack usage and a smaller core dump size.Whereas in
b.c
,i
is modified before making the recursive call and that prevents the compiler from optimizing the recursion into atail-recursive
pattern and that eventually leads to a larger stack usage and a larger core dump size.The default in systemd’s
coredump.conf
isCompress=yes
, according to the man page.Presumably that’s with
zstd
orgzip
.Size depends not just on amount of address-space in use, but on how compressible the data is. Your
a
has a repeating pattern, the samei
in every stack frame, so will compress better. (The saved-RBP will be different every time, but the return address is the same, and the unwritten 12 bytes of the 32-byte frame will be 0 below the first few frames that the_start
/ dynamic linker code might have dirtied before reaching main.)b
doesn’t:i+1
produces a different value in every stack frame that changes in a different way to the saved RBP.And I don’t think zstd or gzip look for delta compression of changing patterns, just exact matches. Or if they do look for deltas, maybe having two changing values (saved-RBP and the spilled
i
) throws that off. Both usually only change in the low byte,i
changing by 1, saved-RBP changing by 32 (the size of each stack frame).