skip to Main Content

When trying to disassemble every instruction of a program as this SO answer suggests, I found out that gdb will never finish processing after stepping instruction into gettimeofday. Here is a minimal reproducible example:

In main.c:

#include <stdio.h>
#include <stdlib.h>

#include <sys/time.h>

int main()
{
    struct timeval tv;
    
    gettimeofday(&tv, NULL);

    return 0;
}

Compile with gcc main.c

To run: gdb -silent a.out

Reading symbols from a.out...
(gdb) set height 0
(gdb) b main
Breakpoint 1 at 0x1175: file main.c, line 7.
(gdb) set logging file ./log.txt
(gdb) set logging redirect on
(gdb) set logging on
Redirecting output to ./log.txt.
Copying debug output to ./log.txt.
(gdb) r
(gdb) while 1
 >si
 >end

And log.txt shows:

Starting program: /home/user/tst-gdb-step/a.out 

Breakpoint 1, main () at main.c:7
7   {
0x000055555555517e  7   {
0x0000555555555182  7   {
10      gettimeofday(&tv, NULL);
0x0000555555555188  10      gettimeofday(&tv, NULL);
0x000055555555518d  10      gettimeofday(&tv, NULL);
0x0000555555555190  10      gettimeofday(&tv, NULL);
0x0000555555555070 in gettimeofday@plt ()
0x0000555555555074 in gettimeofday@plt ()
0x00007ffff7fcd690 in gettimeofday ()
0x00007ffff7fcd691 in gettimeofday ()
0x00007ffff7fcd698 in gettimeofday ()
0x00007ffff7fcd69b in gettimeofday ()
0x00007ffff7fcd69d in gettimeofday ()
0x00007ffff7fcd69e in gettimeofday ()
0x00007ffff7fcd6a1 in gettimeofday ()
0x00007ffff7fcd6a7 in gettimeofday ()
0x00007ffff7fcd6aa in gettimeofday ()
0x00007ffff7fcd6ae in gettimeofday ()
0x00007ffff7fcd6b4 in gettimeofday ()
0x00007ffff7fcd6ba in gettimeofday ()
0x00007ffff7fcd6bd in gettimeofday ()
0x00007ffff7fcd6c3 in gettimeofday ()
0x00007ffff7fcd6c6 in gettimeofday ()
0x00007ffff7fcd6c8 in gettimeofday ()
0x00007ffff7fcd6cc in gettimeofday ()
0x00007ffff7fcd6cf in gettimeofday ()
0x00007ffff7fcd6d2 in gettimeofday ()
0x00007ffff7fcd6d8 in gettimeofday ()
0x00007ffff7fcd6df in gettimeofday ()
0x00007ffff7fcd6e6 in gettimeofday ()
0x00007ffff7fcd6ed in gettimeofday ()
0x00007ffff7fcd6f0 in gettimeofday ()
0x00007ffff7fcd6f2 in gettimeofday ()
0x00007ffff7fcd6f5 in gettimeofday ()
0x00007ffff7fcd6f9 in gettimeofday ()
0x00007ffff7fcd6fc in gettimeofday ()
0x00007ffff7fcd702 in gettimeofday ()
0x00007ffff7fcd709 in gettimeofday ()
0x00007ffff7fcd70c in gettimeofday ()
0x00007ffff7fcd70f in gettimeofday ()
0x00007ffff7fcd6a7 in gettimeofday ()
0x00007ffff7fcd6aa in gettimeofday ()
0x00007ffff7fcd6ae in gettimeofday ()
0x00007ffff7fcd6b4 in gettimeofday ()
0x00007ffff7fcd6ba in gettimeofday ()
0x00007ffff7fcd6bd in gettimeofday ()
0x00007ffff7fcd6c3 in gettimeofday ()
0x00007ffff7fcd6c6 in gettimeofday ()
0x00007ffff7fcd6c8 in gettimeofday ()
0x00007ffff7fcd6cc in gettimeofday ()
0x00007ffff7fcd6cf in gettimeofday ()
0x00007ffff7fcd6d2 in gettimeofday ()
0x00007ffff7fcd6d8 in gettimeofday ()
0x00007ffff7fcd6df in gettimeofday ()
...

After 3 whole minutes of exeuction, gdb writes nearly 500,000 lines to log.txt (and without any sign to stop), which is obviously not the normal case, since vdso is meant to be fast. The log also shows an infinite loop.

But if n instead of si is used, the program can exit without any problem.

Tools I’m using:

$ uname -a
Linux 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ gdb --version
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2

I’m wondering why does this happen. Thanks!

2

Answers


  1. Chosen as BEST ANSWER

    Here is the gdb script I use to bypass this problem:

    # load the program binary, symbol table, etc., and start it, requiring gdb > 8.1
    starti
    # print location for every instruction
    display/i $pc
    # tell gdb to output without caring window height
    set height 0
    
    while 1
        # run the functions to completion if they cause problems
        if ($pc == gettimeofday)
            fin
        end
        if ($pc == clock_gettime)
            fin
        end
        # optional:
        # newer x86 version of `memset` uses `rep` instructions
        # which will expand to a whole lot of lines. So ignore it too.
        if ($pc == memset)
            fin
        end
        # most importantly, step instruction on default
        si
    end
    

    This does not work correctly for all cases (e.g., branching on time-related events) and is obviously slow, but it does the job for me.


  2. This is a guess based on source diving: I believe single stepping through the guts of gettimeofday is slowing those guts down so much that they get stuck in an endless loop trying and failing to meet their internal accuracy requirement for the result.

    The code for gettimeofday on x86-64/Linux is in linux/lib/vdso/gettimeofday.c. The "guts" I’m talking about are the do_hres function. (I’ve linked to kernel 5.15.0 because that’s what you have, but this file doesn’t change often.) Notice the loop in this function. Inside the loop, it’s calculating the time as, essentially, (coarse counter maintained by kernel) + (hardware high-precision clock that overflows often). Every time the kernel updates the coarse counter, it also updates the "sequence number" being read in the loop condition. The designer expected this loop to cycle at most twice — the point is that if you enter the loop at just the wrong moment and read inconsistent data about the coarse counter, thus computing a nonsense time, you’ll retry and get it right the second time around.

    But stepping through this code instruction by instruction, even in an automated way, makes it so slow that the kernel always changes the coarse counter and the sequence number while the loop body is executing, and so you get stuck in the loop.

    (Do not underestimate just how slow it is to step through machine code instruction by instruction, compared to normal execution. You’re adding the time cost of two context switches and several commands in GDB’s not particularly speedy scripting language to every instruction. I wouldn’t be surprised if the context switches and attendant cache and TLB thrashing were enough to trigger this livelock all by themselves.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search