skip to Main Content

Imagine you had a set of R scripts that form an ETL pipeline that you wanted to run as an AWS Glue job. AWS Glue supports Python and Scala.

Is it possible to call an R as a Python subprocess (or a bash script that wraps a set of R scripts) within an AWS Glue job running in a container with Python and R dependencies?

If so, please outline the steps required and key considerations.

2

Answers


  1. It is not possible

    While possible to run custom code in Glue, as it is based on Spark only Scala and Python are supported. Regarding the question if Python subprocess, it seems not to be an option as mentioned in the documentation:

    Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported.

    As @Isc commented, I would recommend using Docker with ECS to run batch ETL jobs using R.

    Login or Signup to reply.
  2. As Glue doesn’t natively support running R scripts, you can consider the following as an alternative:

    1. Customise your own Docker image
    2. Push the image to ECR
    3. Configure the compute resources and schedule using AWS Batch

    Example folder structure

    .
    ├── Dockerfile
    └── scripts
        └── rtest.R
    

    Example Dockerfile based on https://hub.docker.com/r/rocker/tidyverse

    FROM rocker/tidyverse:4.2.2
    WORKDIR /scripts
    COPY scripts/* /scripts
    RUN chmod 755 ./*
    # Install additional R libraries
    

    Example commands to push the image to ECR

    aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com
    
    docker build -t rdev .
    
    docker tag rdev:latest aws_account_id.dkr.ecr.region.amazonaws.com/dev:latest
    
    docker push aws_account_id.dkr.ecr.region.amazonaws.com/dev:latest
    

    Ref: https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html

    Then follow this guide to configure an ECS cluster on Fargate, create and execute a job: https://docs.aws.amazon.com/batch/latest/userguide/getting-started-fargate.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search