skip to Main Content

I have a psql RDS on the same AWS account where I am trying to set up a glue connection to it. I used the RDS option and selected my existing RDS, then I set up the network to be the same vpc,subnet and security groups as my RDS is hosted on.

So when the connection is ready, I tested it. But I got an error about the subnet of the vpc not finding an S3 endpoint.

I know from reading the documentation that I have to create a VPC endpoint towards the S3 to work as a gateway. Which I can do

But why is an S3 endpoint needed for a JDBC connection? What does the S3 have to do with reaching a database if we are already in the same network/subnet and have the correct security group to reach it? At what point in a JDBC connection does it needs a gateway to S3?

2

Answers


  1. The documentation explains this:

    Many customers have legitimate privacy and security concerns about sending and receiving data across the public internet. Customers can address these concerns by using a virtual private network (VPN) to route all Amazon S3 network traffic through their own corporate network infrastructure. However, this approach can introduce bandwidth and availability challenges.

    VPC endpoints for Amazon S3 can alleviate these challenges. A VPC endpoint for Amazon S3 enables AWS Glue to use private IP addresses to access Amazon S3 with no exposure to the public internet. AWS Glue does not require public IP addresses, and you don’t need an internet gateway, a NAT device, or a virtual private gateway in your VPC. You use endpoint policies to control access to Amazon S3. Traffic between your VPC and the AWS service does not leave the Amazon network.

    When you create a VPC endpoint for Amazon S3, any requests to an Amazon S3 endpoint within the Region (for example, s3.us-west-2.amazonaws.com) are routed to a private Amazon S3 endpoint within the Amazon network. You don’t need to modify your applications running on Amazon EC2 instances in your VPC—the endpoint name remains the same, but the route to Amazon S3 stays entirely within the Amazon network, and does not access the public internet.

    Login or Signup to reply.
  2. When you set up a Glue job, under the Job details tab in the Advanced properties section you have to specify things like Script path, Temporary path etc which point to S3 locations. So the Glue job needs to be able to access S3.

    When you associate a connection with your Glue job, it causes the Glue job to be run within the VPC specified in the connection. So now your Glue job needs a way to connect to S3 (which is external to your VPC) from within the VPC .

    From this page of the Glue Developer Guide:

    To access Amazon S3 from within your VPC, a VPC endpoint is required.

    Try adding an S3 gateway endpoint to your VPC.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search