I have been running a nvidia docker image since 13 days and it used to restart without any problems using docker start -i <containerid>
command. But, today while I was downloading pytorch inside the container, download got stuck at 5% and gave no response for a while.
I couldn’t exit the container either by ctrl+d or ctrl+c
. So, I exited the terminal and in new terminal I ran this docker start -i <containerid>
again. But ever since this particular container is not responding to any command. Be it start/restart/exec/commit …nothing! any command with this container ID or name is just non-responsive and had to exit out of it only after ctrl+c
I cannot restart the docker service since it will kill all running docker containers.
Cannot even stop the container using this docker container stop <containerid>
Please help.
2
Answers
I had to restart docker process to revive my container. There was nothing else I could do to solve it. used
sudo service docker restart
and then revived my container using docker run. I will try to build the dockerfile out of it in order to avoid future mishaps.You can make use of docker RestartPolicy:
docker update --restart=always <container>
while mindful of caveats on the docker version you running.
or explore an answer by @Yale Huang from a similar question: How to add a restart policy to a container that was already created