skip to Main Content

I need to configure Athena to connect to a couple of data sources – MySQL RDS, Postgres RDS, and a custom federation Lambda.

enter image description here

Ideally, I’d like to configure these via Cloudformation so I can commit these configurations. I cannot seem seem find any resources to configure these data sources – https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_Athena.html

The AWS CLI doesn’t seem to support the creation of data sources as well. https://docs.aws.amazon.com/cli/latest/reference/athena/

Are there APIs or other ways to configure Athena data sources?

2

Answers


  1. you should use this Constuct CfnApplication

    new CfnApplication(this, "MYID", {
      location: {
        applicationId: "arn:aws:THECONNECTORYOUWANT",
        semanticVersion: "THEVERSION"
      },
      parameters: {
        AthenaCatalogName: "THENAMEFROMTHECATALOGS",
        SpillBucket: bucketName
      }
    })
    
    Login or Signup to reply.
  2. The magic you’re looking for is the AWS::Athena::DataCatalog resource. You can use that to create a new data catalog that points to your federated data source by providing a Type value of LAMBDA. Here’s a sample JSON template for creating a new federated data source:

    AWSTemplateFormatVersion: 2010-09-09
    Resources:
      Type: 'AWS::Athena::DataCatalog'
      Properties:
        Name: mysql
        Type: LAMBDA
        Description: Awesome MySQL Database
        Parameters:
          function: 'arn:aws:lambda:<REGION>:<ACCOUNT_ID>:function:lambdaname'
    
    

    Or the CLI

    aws athena create-data-catalog 
      --name excel_files 
      --type LAMBDA 
      --parameters function=arn:aws:lambda:<REGION>:<ACCOUNT_ID>:function:athena-excel
    

    The provided function is the ARN of the Lambda function for your data source, which also needs to get deployed.

    For the officially supported data sources in aws-athena-query-federation repo, they are available in the Serverless Application Repository (SAR) and can also be deployed using the AWS SAM CLI as they all typically have various additional roles and configuration required to run.

    If you want to use CloudFormation, you can grab the SAR Application ARN and parameters. These can be found either when you’re creating the data source or by searching the SAR. For MySql the ARN is arn:aws:serverlessrepo:us-east-1:292517598671:applications/AthenaMySQLConnector so the CloudFormation template would look something like this (plus all the additional parameters):

    Transform: 'AWS::Serverless-2016-10-31'
    Resources:
      AthenaMySQLConnector:
        Type: 'AWS::Serverless::Application'
        Properties:
          Location:
            ApplicationId: >-
              arn:aws:serverlessrepo:us-east-1:292517598671:applications/AthenaMySQLConnector
            SemanticVersion: 2023.37.2
          Parameters:
            DefaultConnectionString: YOUR_VALUE
            LambdaFunctionName: mysqlconnector
            SecretNamePrefix: YOUR_VALUE
            SecurityGroupIds: YOUR_VALUE
            SpillBucket: YOUR_VALUE
            SubnetIds: YOUR_VALUE
    
    

    So to sum up, create the Lambda then create the DataCatalog resource to point to the Lambda.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search