When I try to create an Azure container instance for EJBCA-ce I get an error and cannot see any logs.
I expect the following result :
But I get the following error :
Failed to start container my-azure-container-resource-name, Error response: to create containerd task: failed to create container e9e48a_________ffba97: guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown
Some context:
I run the container on azure cloud container instance
I tried
- from ARM template
- from Azure Portal.
- with file share mounted
- with database env variable
- without any env variables
It runs fine locally using the same env variable (database configuration).
It used to run with the same configuration a couple weeks ago.
Here are some logs I get when I attach the container group from az cli.
(count: 1) (last timestamp: 2020-11-03 16:04:32+00:00) pulling image "primekey/ejbca-ce:6.15.2.3"
(count: 1) (last timestamp: 2020-11-03 16:04:37+00:00) Successfully pulled image "primekey/ejbca-ce:6.15.2.3"
(count: 28) (last timestamp: 2020-11-03 16:27:52+00:00) Error: Failed to start container aci-pulsy-ccm-ejbca-snd, Error response: to create containerd task: failed to create container e9e48a06807fba124dc29633dab10f6229fdc5583a95eb2b79467fe7cdffba97: guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown
An extract of the dockerfile from dockerhub
I suspect the issue might be related to the commands USER 0
and USER 10001
we found several times in the dockerfile.
COPY dir:89ead00b20d79e0110fefa4ac30a827722309baa7d7d74bf99910b35c665d200 in /
/bin/sh -c rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
CMD ["/bin/bash"]
USER 0
COPY dir:893e424bc63d1872ee580dfed4125a0bef1fa452b8ae89aa267d83063ce36025 in /opt/primekey
COPY dir:756f0fe274b13cf418a2e3222e3f6c2e676b174f747ac059a95711db0097f283 in /licenses
USER 10001
CMD ["/opt/primekey/wildfly-14.0.1.Final/bin/standalone.sh" "-b" "0.0.0.0"
MAINTAINER PrimeKey Solutions AB
ARG releaseTag
ARG releaseEdition
ARM template
{
"type": "Microsoft.ContainerInstance/containerGroups",
"apiVersion": "2019-12-01",
"name": "[variables('ejbcaContainerGroupName')]",
"location": "[parameters('location')]",
"tags": "[variables('tags')]",
"dependsOn": [
"[resourceId('Microsoft.DBforMariaDB/servers', variables('ejbcaMariadbServerName'))]",
"[resourceId('Microsoft.DBforMariaDB/servers/databases', variables('ejbcaMariadbServerName'), variables('ejbcaMariadbDatabaseName'))]"
],
"properties": {
"sku": "Standard",
"containers": [
{
"name": "[variables('ejbcaContainerName')]",
"properties": {
"image": "primekey/ejbca-ce:6.15.2.3",
"ports": [
{
"protocol": "TCP",
"port": 443
},
{
"protocol": "TCP",
"port": 8443
}
],
"environmentVariables": [
{
"name": "DATABASE_USER",
"value": "[concat(parameters('mariadbUser'),'@', variables('ejbcaMariadbServerName'))]"
},
{
"name": "DATABASE_JDBC_URL",
"value": "[variables('ejbcaEnvVariableJdbcUrl')]"
},
{
"name": "DATABASE_PASSWORD",
"secureValue": "[parameters('mariadbAdminPassword')]"
}
],
"resources": {
"requests": {
"memoryInGB": 1.5,
"cpu": 2
}
}
,
"volumeMounts": [
{
"name": "certificates",
"mountPath": "/mnt/external/secrets"
}
]
}
}
],
"initContainers": [],
"restartPolicy": "OnFailure",
"ipAddress": {
"ports": [
{
"protocol": "TCP",
"port": 443
},
{
"protocol": "TCP",
"port": 8443
}
],
"type": "Public",
"dnsNameLabel": "[parameters('ejbcaContainerGroupDNSLabel')]"
},
"osType": "Linux",
"volumes": [
{
"name": "certificates",
"azureFile": {
"shareName": "[parameters('ejbcaCertsFileShareName')]",
"storageAccountName": "[parameters('ejbcaStorageAccountName')]",
"storageAccountKey": "[parameters('ejbcaStorageAccountKey')]"
}
}
]
}
}
It runs fine on my local machine on linux (ubuntu 20.04)
docker run -it --rm -p 8080:8080 -p 8443:8443 -h localhost -e DATABASE_USER="mymaridbuser@my-db" -e DATABASE_JDBC_URL="jdbc:mariadb://my-azure-domain.mariadb.database.azure.com:3306/ejbca?useSSL=true" -e DATABASE_PASSWORD="my-pwd" primekey/ejbca-ce:6.15.2.3
2
Answers
User with UID
10001
does not exists in your image. This does not preventUSER
command in your Dockerfile to work or the image to be invalid itself, but it seems to cause issues with Azure container.I cannot find doc or any reference on why it doesn’t work on Azure (will update if so), but adding the user in the image should solve the issue. Try adding something like this in your Dockerfile to create user with UID
10001
(this must be done as root, i.e. with user0
) :Additional notes to see user
10001
does not exists:In the EJBCA-ce container image, I think they are trying to provide an user different than
root
to run the EJBCA server. According to the Docker documentation:In the
Dockerfile
they reference two users,root
, corresponding to UID0
, and another one, with UID10001
.Typically, in Linux and UNIX systems, UIDs can be organized in different ranges: it is largely dependent on the concrete operating system and user management praxis, but it is very likely that the first user account created in a linux system will be assigned to UID
1001
or10001
, like in this case. Please, see for instance the UID entry in wikipedia or this article.AFAIK, the
USER
indicated does not need to exist in your container to run it correctly: in fact, if you run it locally, it will start without further problem.The user with UID
10001
will be actually setup in your container by the script that is run in theCMD
defined in theDockerfile
,/opt/primekey/bin/start.sh
, by this code fragment:Please, be aware that
APPLICATION_NAME
in this context takes the valueejbca
and that the user which runs this script, as indicated in theDockerfile
, is10001
. That will be the value provided by the commandid -u
in this code.You can verify it if you run your container locally:
And initiate
bash
into it:If you run
whoami
, it will tell youejbca
.If you run
id
it will give you the following output:You can verify the user existence in the
/etc/passwd
as well:The reason why Pierre did not get this output is because he ran the container overwriting the provided
CMD
and, as a consequence, not executing thestart.sh
script responsible of the user creation, as above mentioned.For any reason, and this is where my knowledge fails me, when Azure is trying to run your container, it is failing because the
USER
10001
identified in theDockerfile
does not exist.I think it could be related with the use of
containerd
instead ofdocker
.The error reported by Azure seems related with the Microsoft project opengcs.
They say about the project:
And:
The error you see in the console is raised by the
spec.go
file that you can find in their code base, when they are trying to establish the user on behalf of whom the container process should be run:This code is executed by this other code fragment – you can see the full function code here:
And the
getUser
function:As you can see, these are exactly the errors that Azure is reporting you.
As a summary, I think they are providing a Windows LCOW solution that conforms to the OCI Image Format Specification suitable to run containers with
containerd
.As you indicated if It used to run with the same configuration a couple weeks ago my best guest is that, perhaps, they switched your containers from a pure Linux
containerd
runtime implementation to one based in Windows and in the above mentioned software, and this is why you containers are now failing.A possible workaround could be to create a custom image based on the official provided by PrimeKey and create the user
10001
, as also Pierre pointed out.To accomplish this task, first, create a new custom
Dockerfile
. You can try, for instance:Please, note that you may need to define some of the environment variables from the official EJBCA image.
With this
Dockerfile
you can build your image withdocker
or docker compose with an appropriatedocker-compose.yaml
file, something like:Please, customize it as you consider appropriate.
With this setup the new container will still run properly in a local environment in the same way as the original one: I hope it will be also the case in Azure.