Terraform: wait till the instance is "reachable" - CentOS

kharandziuk
June 16, 2020
266 views
3 votes
3 Answers

I have some Terraform code with an aws_instance and a null_resource:

resource "aws_instance" "example" {
  ami           = data.aws_ami.server.id
  instance_type = "t2.medium"
  key_name      = aws_key_pair.deployer.key_name

  tags = {
    name = "example"
  }

  vpc_security_group_ids = [aws_security_group.main.id]
}

resource "null_resource" "example" {
  provisioner "local-exec" {
    command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
  }
}

It kinda works, but sometimes there is a bug (probably when the instance in a pending state). When I rerun Terraform – it works as expected.

Question: How can I run local-exec only when the instance is running and accepting an SSH connection?

Answers

Chosen as BEST ANSWER
- kharandziuk
- June 19, 2020 at 9:58 am
- 0 votes
0
Here is an Ansible-specific solution for this problem. Add this code to your playbook (there is also a pre_task clause if you use roles)
```
- name: will wait till reachable
  hosts: all
  gather_facts: no # important
  tasks:
    - name: Wait for system to become reachable
      wait_for_connection:

    - name: Gather facts for the first time
      setup:
```

(Edit)

- ydaetskcoR
- June 16, 2020 at 1:44 pm
- 0 votes
0
The null_resource is currently only going to wait until the aws_instance resource has completed which in turn only waits until the AWS API returns that it is in the Running state. There’s a long gap from there to the instance starting the OS and then being able to accept SSH connections before your local-exec provisioner can connect.

One way to handle this is to use the remote-exec provisioner on the instance first as that has the ability to wait for the instance to be ready. Changing your existing code to handle this would look like this:
```
resource "aws_instance" "example" {
  ami           = data.aws_ami.server.id
  instance_type = "t2.medium"
  key_name      = aws_key_pair.deployer.key_name

  tags = {
    name = "example"
  }

  vpc_security_group_ids = [aws_security_group.main.id]


}

resource "null_resource" "example" {
  provisioner "remote-exec" {
    connection {
      host = aws_instance.example.public_dns
      user = "centos"
      file = file("files/id_rsa")
    }

    inline = ["echo 'connected!'"]
  }

  provisioner "local-exec" {
    command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns},  --user centos --private-key files/id_rsa playbook.yml"
  }
}
```
This will first attempt to connect to the instance’s public DNS address as the centos user with the files/id_rsa private key. Once it is connected it will then run echo 'connected!' as a simple command before moving on to your existing local-exec provisioner that runs Ansible against the instance.

Note that just being able to connect over SSH may not actually be enough for you to then provision the instance. If your Ansible script tries to interact with your package manager then you may find that it is locked from the instance’s user data script running. If this is the case you will need to remotely execute a script that waits for cloud-init to be complete first. An example script looks like this:
```
#!/bin/bash

while [ ! -f /var/lib/cloud/instance/boot-finished ]; do
  echo -e "33[1;36mWaiting for cloud-init..."
  sleep 1
done
```
Login or Signup to reply.

For cases where instances are not externally exposed (About 90% of the time in most of my projects), and SSM agent is installed on the target instance (newer AWS AMIs come pre-loaded with it), you can leverage SSM to probe the instance. Here’s some sample code:

instanceId=$1
echo "Waiting for instance to bootstrap ..."
tries=0
responseCode=1
while [[ $responseCode != 0 && $tries -le 10 ]]
do
  echo "Try # $tries"
  cmdId=$(aws ssm send-command --document-name AWS-RunShellScript --instance-ids $instanceId --parameters commands="cat /tmp/job-done.txt # or some other validation logic" --query Command.CommandId --output text)
  sleep 5
  responseCode=$(aws ssm get-command-invocation --command-id $cmdId --instance-id $instanceId --query ResponseCode --output text)
  echo "ResponseCode: $responseCode"
  if [ $responseCode != 0 ]; then
    echo "Sleeping ..."
    sleep 60
  fi
  (( tries++ ))
done
echo "Wait time over. ResponseCode: $responseCode"

Assuming you have AWS CLI installed locally, you can have this null_resource required before you act on the instance. In my case, I was building an AMI.

resource "null_resource" "wait_for_instance" {
  depends_on = [
    aws_instance.my_instance
  ]
  triggers = {
    always_run = "${timestamp()}"
  }
  provisioner "local-exec" {
    command = "${path.module}/scripts/check-instance-state.sh ${aws_instance.my_instance.id}"
  }
}

Please signup or login to give your own answer.

Click here to cancel reply.

Terraform: wait till the instance is "reachable" – CentOS

Answers