Amazon web services - Terraform Deployment of ECS Targets Failing

SethBrock
December 11, 2024
33 views
0 votes
2 Answers

So, I’ve used terraform and modules to deploy AWS resources step-by-step. After creating VPCs, RDS databases, etc., I uploaded my docker image to ECR and then try to use it when launching ECS with Fargate. Everything seems to work correctly, except the tasks keep failing to launch. This is the error I get:

ResourceInitializationError: unable to pull secrets or registry auth: The task cannot pull registry auth from Amazon ECR: There is a connection issue between the task and Amazon ECR. Check your task network configuration. RequestError: send request failed caused by: Post "https://api.ecr.us-east-2.amazonaws.com/": dial tcp 3.17.137.51:443: i/o timeout

The docker image works on it’s own, I tested retrieving it from AWS and running a local container. Going to the DNS record assigned used to not load anything, and now it gives a 503 error for the service being temporarily unavailable. Anyone have any tips for this?

Here is the code: https://github.com/sethbr11/proj2/blob/main/terraform/modules/fargate/main.tf. You can look at the rest of the repository if you want to see how modules interact with each other, or the deploy script to see how things get launched.

Answers

You probably need to add some policies to your node role so that it has permission to retrieve the image, such as

arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

This article goes through the whole setup, and it has this example:

# role for nodegroup

resource "aws_iam_role" "nodes" {
  name = "eks-node-group-nodes"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

# IAM policy attachment to nodegroup

resource "aws_iam_role_policy_attachment" "nodes-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.nodes.name
}


# aws node group 

resource "aws_eks_node_group" "private-nodes" {
  cluster_name    = aws_eks_cluster.demo.name
  node_group_name = "private-nodes"
  node_role_arn   = aws_iam_role.nodes.arn

  subnet_ids = [
    aws_subnet.private-us-east-1a.id,
    aws_subnet.private-us-east-1b.id
  ]

  capacity_type  = "ON_DEMAND"
  instance_types = ["t2.medium"]

  scaling_config {
    desired_size = 1
    max_size     = 10
    min_size     = 0
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    node = "kubenode02"
  }

  # taint {
  #   key    = "team"
  #   value  = "devops"
  #   effect = "NO_SCHEDULE"
  # }

  # launch_template {
  #   name    = aws_launch_template.eks-with-disks.name
  #   version = aws_launch_template.eks-with-disks.latest_version
  # }

  depends_on = [
    aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly,
  ]
}

- MarkB
- December 11, 2024 at 9:17 pm
- 0 votes
0
You are deploying your ECS task to both a public subnet and a private subnet. That alone is a big mess, where your ECS task is going to randomly get a different networking configuration depending on which subnet ECS decides to deploy to. You should use only private, or only public subnets for the ECS deployment. I think if you change it to only public subnets this issue may go away, since you also have public IP assignment enabled for the task.

Also, I see you are assigning Network ACL rules to the subnets. You really shouldn’t mess with Network ACL rules like that unless you really know what you are doing. The default Network ACL rules, plus your security group rules should be all that you need. You aren’t allowing ephemeral ports at all in your Network ACL ingress rules, so you are basically blocking any responses network requests sent from your ECS service. Again, if you don’t understand how that works, then you should really not be touching it. I suggest removing all the aws_network_acl resources from your Terraform code.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – Terraform Deployment of ECS Targets Failing

Answers