skip to Main Content

I am trying to create a AWS Glue job scheduler in terraform based on condition where Crawler triggered by Cron succeeded:

resource "aws_glue_trigger" "trigger" {
  name = "trigger"
  type = "CONDITIONAL"

  actions {
    job_name = aws_glue_job.job.name
  }

  predicate {
    conditions {
      crawler_name = aws_glue_crawler.crawler.name
      crawl_state  = "SUCCEEDED"
    }
  }
}

It applies cleanly but in the job schedules property I am getting job with
Invalid expression in Cron column while the status is Activated. Of course it won’t trigger because of that. What I am missing here?

2

Answers


  1. Not sure if I understood the question correctly, but this is my glue trigger configuration, which is to run at scheduled time. And this is triggered at the scheduled time.

    resource "aws_glue_trigger" "tr_one" {
      name          = "tr_one"
      schedule = var.wf_schedule_time
      type          = "SCHEDULED"
      workflow_name = aws_glue_workflow.my_workflow.name
    
      actions {
        job_name = var.my_glue_job_1
      }
    }
    
    // Specify schedule time in UTC format to run glue workfows
    wf_schedule_time = "cron(56 09 * * ? *)"
    

    Please note that the schedule should be in utc time.

    Login or Signup to reply.
  2. I had the same problem. Unfortunately I did not find an easy way to solve the ‘invalid expression’ by just using the aws_glue_triggers. Although I figured out a nice workaround using glue workflows to achieve the same goal (to trigger a glue job after a crawler succeeded) I am not quite sure if this is the best way to do it.

    First i created a glue workflow

    resource "aws_glue_workflow" "my_workflow" {
      name = "my-workflow"
    }
    

    Then I created a scheduled trigger for my crawler (and I removed the scheduler of the glue crawler I referenced)

    resource "aws_glue_trigger" "crawler_scheduler" {
      name          = "crawler-trigger"
      workflow_name = "my-workflow"
      type          = "SCHEDULED"
      schedule      = "cron(15 12 * * ? *)"
      actions {
        crawler_name = "my-crawler"
      }
    }
    

    Lastly I created the final trigger for my glue job which shall run after the crawler succeeded. The important aspect here is that both triggers are linked to the same workflow; virtually linking crawler & job.

    resource "aws_glue_trigger" "job_trigger" {
      name          = "${each.value.s3_bucket_id}-ndjson_to_parquet-trigger"
      type          = "CONDITIONAL"
      workflow_name = "my-workflow"
    
      predicate {
        conditions {
          crawler_name = "my-crawler"
          crawl_state  = "SUCCEEDED"
        }
      }
    
      actions {
        job_name = "my-job"
      }
    }
    

    The glue job still shows the error message ‘invalid expression’ under the schedule label but this time you can successfully trigger the glue job by just running the scheduler. In addition to this you will even get a visualization in glue-workflows.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search