I’m running a openmp program (gcc and libgomp on CentOS 8.5). I used strace to inspect and found that syscall clone was called over and over again (I posed part of the log below), which I believe implies that openmp threads were constantly recreated, since all the other non-openmp thread has a fixed number, and all initialized at the very begining of the main function.
But I have also tried to write a simply openmp program, it seems that openmp create a thread pool in initialization stage, and reused it later.
So my question is: in what situations, libgomp thread will terminate, and it recreats threads?
clone(child_stack=0x7f16bff89ef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265184], tls=0x7f16bff8f700, child_tidptr=0x7f16bff8f9d0) = 3265184
sched_setaffinity(3265184, 16, [8]) = 0
futex(0x7f16bff8fd18, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4012f184, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x4012f184, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clone(child_stack=0x7f16c1f8def0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265185], tls=0x7f16c1f93700, child_tidptr=0x7f16c1f939d0) = 3265185
sched_setaffinity(3265185, 16, [2]) = 0
futex(0x7f16c1f93d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c078aef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265186], tls=0x7f16c0790700, child_tidptr=0x7f16c07909d0) = 3265186
sched_setaffinity(3265186, 16, [4]) = 0
futex(0x7f16c0790d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16bff89ef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265187], tls=0x7f16bff8f700, child_tidptr=0x7f16bff8f9d0) = 3265187
sched_setaffinity(3265187, 16, [6]) = 0
futex(0x7f16bff8fd18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c178cef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265188], tls=0x7f16c1792700, child_tidptr=0x7f16c17929d0) = 3265188
sched_setaffinity(3265188, 16, [8]) = 0
futex(0x7f16c1792d18, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x40083f14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
clone(child_stack=0x7f16c1f8def0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265189], tls=0x7f16c1f93700, child_tidptr=0x7f16c1f939d0) = 3265189
sched_setaffinity(3265189, 16, [2]) = 0
futex(0x7f16c1f93d18, FUTEX_WAKE_PRIVATE, 1) = 1
clone(child_stack=0x7f16c178cef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[3265190], tls=0x7f16c1792700, child_tidptr=0x7f16c17929d0) = 3265190
sched_setaffinity(3265190, 16, [4]) = 0
environmental variables:
export PARALLEL_ENSEMBLE_THREADS=5
export GOMP_CPU_AFFINITY=7,2,4,6,8
2
Answers
This is more of a "sysadmin" answer, but you can use
strace
to give you stacktraces showing where a given syscall is invoked. Use the-k
command line option for that. So if you, for example, try this:(you’ll see the
SIGCHLD
after sincestrace
will show signals by default, but that part isn’t relevant for the answer)With your application (tracing only
clone()
, following children-f
) this should tell you where inlibgomp
‘s code the thread is created.I could reproduce the behavior:
Whenever the team size shrinks, libgomp seems to drop the extra thread and needs to create a new thread for the next team.