skip to Main Content

We have other ECS Services running which use images from our private ECR repo. However for our Services in the same cluster which are trying to pull from Docker Hub we are getting the following error:

CannotPullContainerError: inspect image has been retried 5 time(s): httpReaderSeeker: failed open: unexpected status code https://registry-1.docker.io…: 4…

(The message itself is truncated at the end: it is literally "4…").

Judging by the fact that it’s getting a status code response, that suggests that it’s able to talk to Docker Hub and it’s not a network connectivity issue within our AWS configuration. We are trying to use an image in our ECS Task from a public repo, one is a Redis image and another is a Hasura image. I’m not sure how to see the status code itself since it’s truncated in the AWS console.

When I hit the URL from the error in my browser this is the response:

{"errors":[{"code":"UNAUTHORIZED","message":"authentication required","detail":[{"Type":"repository","Class":"","Name":"hasura/graphql-engine","Action":"pull"}]}]}

I get a similar response with the Redis image. I didn’t think we needed any authentication to pull public images – we’ve run ECS Tasks in the past without requiring authentication to Docker Hub?

For completeness, I’ve included the checks below for troubleshooting this error, however as mentioned, since we’re getting a response code from Docker Hub it doesn’t look like these checks are relevant.

AWS has this guide to troubleshoot ‘CannotPullContainer’ errors and for this particular error on Fargate there is this guide. Here are the things from the guide we have checked:

Confirm that your VPC networking configuration allows your Amazon ECS infrastructure to reach the image repository

This ECS Task was in a private subnet, and it’s route table had the following routes:

10.0.0.0/16 -> local (active)
0.0.0.0/0 -> NAT Gateway (active)

The NAT Gateway has status available and an Elastic IP address assigned.

Check the VPC DHCP Option Set

Looking at the VPC and going to the DHCP options set we can see Domain name servers is set to: ‘AmazonProvidedDNS’

Check the task execution role permissions
More details about configuring this are in this guide.

The same IAM role is used in the task definition for both the ‘task role’ and ‘task execution role.’ This has been with the following default policy as defined in the guide mentioned:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "logs:PutLogEvents",
                "logs:CreateLogStream",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetAuthorizationToken",
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability"
            ],
            "Resource": "*"
        }
    ]
}

Check that the image exists

This is the image we’re trying to pull from Docker Hub. The image exists and I can pull it from my local machine without having to authenticate.

2

Answers


  1. I faced the same exact issue and spent a whole day looking for an error in my configuration.

    It turned out DockerHub was blocking requests from my ECS task to pull docker images https://www.docker.com/increase-rate-limits.

    If you pull images from DockerHub and face the same issue, just wait for 6 hours or create a new cluster that will give you a new IP address.

    Login or Signup to reply.
  2. For the rate limiting from Docker problem, another solution is to host a copy of the container in ECR. The container I wanted had a sample Dockerfile which I was able to simply execute. 1 caveat here is you will want to periodically rebuild this container to take into account future releases.

    Another option is ECRs ‘Pull through cache’ private repo, which seems to be a caching layer in ECR. I haven’t tried that approach myself.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search