skip to Main Content

I am trying to build a singularity container, to run a Python script on a CentOS 7-based cluster.
The container runs as expected on my host, which I also used to create the container, but fails on the cluster as soon as PyTorch is imported.

The problem can be reproduced with a container build from this minimal definition file:

debug.def:

Bootstrap: arch

%runscript
    exec /usr/bin/python3 -c 'import torch; print(torch.__version__)'

%post
    #--------------------------------------------------------------------------
    # Basic setup from
    # https://github.com/sylabs/singularity/blob/master/examples/arch/Singularity
    #--------------------------------------------------------------------------
    # Set time zone. Use whatever you prefer instead of UTC.
    ln -s /usr/share/zoneinfo/Europe/Berlin /etc/localtime

    # Set the package mirror server(s). This is only for the output image's
    # mirrorlist. `pacstrap' can only use your hosts's package mirrors.
    echo 'Server = https://mirrors.kernel.org/archlinux/$repo/os/$arch' > /etc/pacman.d/mirrorlist

    pacman -Sy --noconfirm gawk sed grep

    # Set locale. Use whatever you prefer instead of en_US.
    echo 'en_US.UTF-8 UTF-8' > /etc/locale.gen
    locale-gen
    echo 'LANG=en_US.UTF-8' > /etc/locale.conf

    pacman -S --noconfirm python python-pytorch

    pacman -S --noconfirm pacman-contrib
    paccache -r -k0

It is build with sudo singularity build debug.sif debug.def.
Both, the container and my host run on Arch Linux.

Executing the container on my host outputs the PyTorch version:

schellsn@host $ singularity run debug.sif
1.3.1

Running it on the cluster results in the following error:

schellsn@cluster tmp$ singularity run debug.sif
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.8/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: libQt5Core.so.5: cannot open shared object file: No such file or directory

I don’t understand why the file is not found as it should be included in the container:

schellsn@cluster tmp$ singularity shell debug.sif 
Singularity debug.sif:~/tmp> ls -l /usr/lib | grep libQt5Core
-rw-r--r--  1 root root      1166 Nov 11 23:40 libQt5Core.prl
lrwxrwxrwx  1 root root        20 Nov 11 23:40 libQt5Core.so -> libQt5Core.so.5.13.2
lrwxrwxrwx  1 root root        20 Nov 11 23:40 libQt5Core.so.5 -> libQt5Core.so.5.13.2
lrwxrwxrwx  1 root root        20 Nov 11 23:40 libQt5Core.so.5.13 -> libQt5Core.so.5.13.2
-rwxr-xr-x  1 root root   5275240 Nov 11 23:40 libQt5Core.so.5.13.2

I assume that the according path is not included within the search while importing and that this problem does not occur on my host because some environment setting is leaking into the container.
I also tried using the Sylabs Remote Builder but it seems that it cannot construct Arch containers (pacstrap not found within $PATH).
Trying to build the container on one of the nodes leads to the same problem; Pacstrap and pacman are not available.

I am at my wits’ end and would be very grateful for any hint to explain this behavior!
Why is the shared library not found and how could that possibly be fixed?

Update #1:

Here is the content of the LD_LIBRARY_PATH environment variable (in response to @tsnowlan).

Arch Linux Host:

schellsn@host tmp$ echo $LD_LIBRARY_PATH
:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64
schellsn@host tmp$ singularity shell evpt_debug.sif
Singularity evpt_debug.sif: ~/tmp> echo $LD_LIBRARY_PATH
:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64:/.singularity.d/libs

CentOS 7 cluster node:

schellsn@cluster tmp$ echo $LD_LIBRARY_PATH
schellsn@cluster tmp$ singularity shell debug.sif 
Singularity debug.sif:~/tmp> echo $LD_LIBRARY_PATH
/.singularity.d/libs

Update #2:

I did setup a new clean VM (also running arch) which also and rebuild the container there. This container shows the same problem; It runs on my host but not on the CentOS 7 cluster.

2

Answers


  1. Chosen as BEST ANSWER

    As a workaround I am building the container now from a def-file that uses the library bootstrap agent with the Ubuntu 18.04 image instead of the Arch bootstrap agent. The resulting container runs on my Arch Host and the CentOS 7 cluster.


  2. I was having the same issue: CentOS 7 host and Arch Linux container (Python 3.8.1 / Pytorch 1.3.1). The following link seems to have fixed my issue for now.

    https://superuser.com/questions/1347723/arch-on-wsl-libqt5core-so-5-not-found-despite-being-installed

    EDIT: From the link, this command worked for me

    sudo strip --remove-section=.note.ABI-tag /usr/lib64/libQt5Core.so.5
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search