For these two days, I have met a weird question.
STAR
from https://github.com/alexdobin/STAR is a program used to build suffix array indexes. I have been used this program for years. It works OK until recently.
For these days, when I run STAR
, it will always be killed.
root@localhost:STAR --runMode genomeGenerate --runThreadN 10 --limitGenomeGenerateRAM 31800833920 --genomeDir star_GRCh38 --genomeFastaFiles GRCh38.fa --sjdbGTFfile GRCh38.gtf --sjdbOverhang 100
.
.
.
Killed
root@localhost:STAR --runMode genomeGenerate --runThreadN 10 --genomeDir star_GRCh38 --genomeFastaFiles GRCh38.fa --sjdbGTFfile GRCh38.gtf --sjdbOverhang 100
Jun 03 10:15:08 ..... started STAR run
Jun 03 10:15:08 ... starting to generate Genome files
Jun 03 10:17:24 ... starting to sort Suffix Array. This may take a long time...
Jun 03 10:17:51 ... sorting Suffix Array chunks and saving them to disk...
Killed
A month ago, the same command with same inputs and same parameters runs OK. It does cost some memory, but not a lot.
I have tried 3 recently released version of this program, all failed. So I do not think it is the question of STAR
program but my sever configuration.
I also tried to run this program as both root
and normal user, no lucky for each.
I suspect there is a limitation of memory usage in my server.
But I do not know how the memory is limited? I wonder if some one can give me some hints.
Thanks!
Tong
The following is my debug process and system info.
Command dmesg -T| grep -E -i -B5 'killed process'
showing it is Out of memory
problem.
But before the STAR
program is killed, top
command showing only 5%
mem is occupied by this porgram.
[一 6 1 23:43:00 2020] [40479] 1002 40479 101523 18680 112 487 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40480] 1002 40480 101526 18681 112 486 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40481] 1002 40481 101529 18682 112 485 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40482] 1002 40482 101531 18673 111 493 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] Out of memory: Kill process 33822 (STAR) score 36 or sacrifice child
[一 6 1 23:43:00 2020] Killed process 33822 (STAR) total-vm:23885188kB, anon-rss:10895128kB, file-rss:4kB, shmem-rss:0kB
[三 6 3 10:02:13 2020] [12296] 1002 12296 101652 18681 113 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12330] 1002 12330 101679 18855 112 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12335] 1002 12335 101688 18682 112 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12365] 1349 12365 30067 1262 11 0 0 bash
[三 6 3 10:02:13 2020] Out of memory: Kill process 7713 (STAR) score 40 or sacrifice child
[三 6 3 10:02:13 2020] Killed process 7713 (STAR) total-vm:19751792kB, anon-rss:12392428kB, file-rss:0kB, shmem-rss:0kB
--
[三 6月 3 10:42:17 2020] [ 4697] 1002 4697 101526 18681 112 486 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4698] 1002 4698 101529 18682 112 485 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4699] 1002 4699 101532 18680 112 487 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4701] 1002 4701 101534 18673 110 493 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] Out of memory: Kill process 21097 (STAR) score 38 or sacrifice child
[三 6月 3 10:42:17 2020] Killed process 21097 (STAR) total-vm:19769500kB, anon-rss:11622928kB, file-rss:884kB, shmem-rss:0kB
Command free -hl
showing I have enough memory.
total used free shared buff/cache available
Mem: 251G 10G 11G 227G 229G 12G
Low: 251G 240G 11G
High: 0B 0B 0B
Swap: 29G 29G 0B
Also as showed by ulimit -a
, no virtual memory limitation is set.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1030545
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1030545
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Here is the version of my Centos and Kernel (output by hostnamectl
):
hostnamectl
Static hostname: localhost.localdomain
Icon name: computer-server
Chassis: server
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-514.26.2.el7.x86_64
Architecture: x86-64
Here shows the content of cat /etc/security/limits.conf
:
#* soft core 0
#* hard rss 10000
#@student hard nproc 20
#@faculty soft nproc 20
#@faculty hard nproc 50
#ftp hard nproc 0
#@student - maxlogins 4
* soft nofile 65536
* hard nofile 65536
#@intern hard as 162400000
#@intern hard nproc 150
# End of file
As suggested, I have updated the output of df -h
:
Filesystem All Used Available (Used)% Mount
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 1.3M 126G 1% /dev/shm
tmpfs 126G 4.0G 122G 4% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/mapper/cl-root 528G 271G 257G 52% /
/dev/sda1 492M 246M 246M 51% /boot
tmpfs 26G 0 26G 0% /run/user/0
tmpfs 26G 0 26G 0% /run/user/1002
tmpfs 26G 0 26G 0% /run/user/1349
tmpfs 26G 0 26G 0% /run/user/1855
ls -a /dev/shm/
. ..
grep Shmem /proc/meminfo
Shmem: 238640272 kB
Several tmpfs
costs 126G
memory. I am googleing this, but still not sure what should be done?
That is the problem of shared memory due to program terminated abnormally.
ipcrm
is used to clear all shared memory and then STAR running
is fine.
$ ipcrm
.....
$ free -h
total used free shared buff/cache available
Mem: 251G 11G 226G 3.9G 14G 235G
Swap: 29G 382M 29G
2
Answers
It looks like the problem is with shared memory: you have 227G of memory eaten up by shared objects.
Shared memory files are persistent. Have a look in
/dev/shm
and any othertmpfs
mounts to see if there are large files that can be removed to free up more physical memory (RAM+swap).It probably has some memory leak. Even old programs may have residual bugs, and they could appear in some very particular cases.
Check with strace(1) or ltrace(1) and pmap(1). Learn also to query
/proc/
, see proc(5), top(1), htop(1). See LinuxAteMyRam and read about memory over-commitment and virtual address space and perhaps a textbook on operating systems.If you have access to the source code of your
STAR
, consider recompiling it with all warnings and debug info (with GCC, you would pass-Wall -Wextra -g
togcc
org++
) then use valgrind and/or some address sanitizer. If you don’t have legal access to the source code ofSTAR
, contact the entity (person or organization) which provided it to you.You could be interested in that draft report and by the Clang static analyzer or Frama-C (or coding your own GCC plugin).
I recommend using
valgrind
orgdb
and inspect your/proc/
to validate that optimistic hypothesis.