skip to Main Content

I have some Terraform code with an aws_instance and a null_resource:

resource "aws_instance" "example" {
  ami           = data.aws_ami.server.id
  instance_type = "t2.medium"
  key_name      = aws_key_pair.deployer.key_name

  tags = {
    name = "example"
  }

  vpc_security_group_ids = [aws_security_group.main.id]
}

resource "null_resource" "example" {
  provisioner "local-exec" {
    command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
  }
}

It kinda works, but sometimes there is a bug (probably when the instance in a pending state). When I rerun Terraform – it works as expected.

Question: How can I run local-exec only when the instance is running and accepting an SSH connection?

3

Answers


  1. Chosen as BEST ANSWER

    Here is an Ansible-specific solution for this problem. Add this code to your playbook (there is also a pre_task clause if you use roles)

    - name: will wait till reachable
      hosts: all
      gather_facts: no # important
      tasks:
        - name: Wait for system to become reachable
          wait_for_connection:
    
        - name: Gather facts for the first time
          setup:
    

  2. The null_resource is currently only going to wait until the aws_instance resource has completed which in turn only waits until the AWS API returns that it is in the Running state. There’s a long gap from there to the instance starting the OS and then being able to accept SSH connections before your local-exec provisioner can connect.

    One way to handle this is to use the remote-exec provisioner on the instance first as that has the ability to wait for the instance to be ready. Changing your existing code to handle this would look like this:

    resource "aws_instance" "example" {
      ami           = data.aws_ami.server.id
      instance_type = "t2.medium"
      key_name      = aws_key_pair.deployer.key_name
    
      tags = {
        name = "example"
      }
    
      vpc_security_group_ids = [aws_security_group.main.id]
    
    
    }
    
    resource "null_resource" "example" {
      provisioner "remote-exec" {
        connection {
          host = aws_instance.example.public_dns
          user = "centos"
          file = file("files/id_rsa")
        }
    
        inline = ["echo 'connected!'"]
      }
    
      provisioner "local-exec" {
        command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns},  --user centos --private-key files/id_rsa playbook.yml"
      }
    }
    

    This will first attempt to connect to the instance’s public DNS address as the centos user with the files/id_rsa private key. Once it is connected it will then run echo 'connected!' as a simple command before moving on to your existing local-exec provisioner that runs Ansible against the instance.

    Note that just being able to connect over SSH may not actually be enough for you to then provision the instance. If your Ansible script tries to interact with your package manager then you may find that it is locked from the instance’s user data script running. If this is the case you will need to remotely execute a script that waits for cloud-init to be complete first. An example script looks like this:

    #!/bin/bash
    
    while [ ! -f /var/lib/cloud/instance/boot-finished ]; do
      echo -e "33[1;36mWaiting for cloud-init..."
      sleep 1
    done
    
    Login or Signup to reply.
  3. For cases where instances are not externally exposed (About 90% of the time in most of my projects), and SSM agent is installed on the target instance (newer AWS AMIs come pre-loaded with it), you can leverage SSM to probe the instance. Here’s some sample code:

    instanceId=$1
    echo "Waiting for instance to bootstrap ..."
    tries=0
    responseCode=1
    while [[ $responseCode != 0 && $tries -le 10 ]]
    do
      echo "Try # $tries"
      cmdId=$(aws ssm send-command --document-name AWS-RunShellScript --instance-ids $instanceId --parameters commands="cat /tmp/job-done.txt # or some other validation logic" --query Command.CommandId --output text)
      sleep 5
      responseCode=$(aws ssm get-command-invocation --command-id $cmdId --instance-id $instanceId --query ResponseCode --output text)
      echo "ResponseCode: $responseCode"
      if [ $responseCode != 0 ]; then
        echo "Sleeping ..."
        sleep 60
      fi
      (( tries++ ))
    done
    echo "Wait time over. ResponseCode: $responseCode"
    

    Assuming you have AWS CLI installed locally, you can have this null_resource required before you act on the instance. In my case, I was building an AMI.

    resource "null_resource" "wait_for_instance" {
      depends_on = [
        aws_instance.my_instance
      ]
      triggers = {
        always_run = "${timestamp()}"
      }
      provisioner "local-exec" {
        command = "${path.module}/scripts/check-instance-state.sh ${aws_instance.my_instance.id}"
      }
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search