I am currently working on a ETL tool at work (python & bash scripts managed with Airflow basically) and I am asking myself wether I should put my EC2 instance which will run the ETL in a public or private subnet. My instance should have acces to the internet to retrieve data (basically ssh through on-premises instances we have) and should also be able to be accesses through SSH.
However, I don’t know if allowing outbound connection to the internet and restrict inbound connection to SSH is enough about security or if I should put the instance in a private subnet and tweaking things to be able to connect to it.
2
Answers
Your ETL instance should be in a private subnet behind a NAT instance.
NAT gateway will give your EC2 private network internet connectivity but still ensure that your EC2 instances are not accessible from the internet. So in order to allow access to internet it has to route traffic through public network which has a Internet gateway attached.
You should put your EC2 instance in a private subnet to prevent hackers from gaining access and stealing your data.
You can learn how to setup NAT gateway here
https://aws.amazon.com/premiumsupport/knowledge-center/nat-gateway-vpc-private-subnet/
As the main purpose of your instance is not to deliver a public service, it’d be more secure in a private subnet and go through a NAT gateway to fetch data from the Internet.
That being said a NAT gateway is expensive, so a common pattern is to use a public subnet/ through an Internet gateway with a deny rule for any incoming traffic. If you don’t want your instance exposed to ddos, don’t even open ssh and use AWS systems manager to ssh your instance.