I have some Terraform code with an aws_instance
and a null_resource
:
resource "aws_instance" "example" {
ami = data.aws_ami.server.id
instance_type = "t2.medium"
key_name = aws_key_pair.deployer.key_name
tags = {
name = "example"
}
vpc_security_group_ids = [aws_security_group.main.id]
}
resource "null_resource" "example" {
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
}
}
It kinda works, but sometimes there is a bug (probably when the instance in a pending state). When I rerun Terraform – it works as expected.
Question: How can I run local-exec only when the instance is running and accepting an SSH connection?
3
Answers
Here is an Ansible-specific solution for this problem. Add this code to your playbook (there is also a pre_task clause if you use roles)
The
null_resource
is currently only going to wait until theaws_instance
resource has completed which in turn only waits until the AWS API returns that it is in theRunning
state. There’s a long gap from there to the instance starting the OS and then being able to accept SSH connections before yourlocal-exec
provisioner can connect.One way to handle this is to use the
remote-exec
provisioner on the instance first as that has the ability to wait for the instance to be ready. Changing your existing code to handle this would look like this:This will first attempt to connect to the instance’s public DNS address as the
centos
user with thefiles/id_rsa
private key. Once it is connected it will then runecho 'connected!'
as a simple command before moving on to your existinglocal-exec
provisioner that runs Ansible against the instance.Note that just being able to connect over SSH may not actually be enough for you to then provision the instance. If your Ansible script tries to interact with your package manager then you may find that it is locked from the instance’s user data script running. If this is the case you will need to remotely execute a script that waits for
cloud-init
to be complete first. An example script looks like this:For cases where instances are not externally exposed (About 90% of the time in most of my projects), and SSM agent is installed on the target instance (newer AWS AMIs come pre-loaded with it), you can leverage SSM to probe the instance. Here’s some sample code:
Assuming you have AWS CLI installed locally, you can have this null_resource required before you act on the instance. In my case, I was building an AMI.