I have a simple (just print hello) glue 2.0 job that runs in parallel, triggered from a step function map. Glue job Maximum concurrency is set to 40 and so as Step Funcitons Map’s MaxConcurrency.
It runs fine if I kicked off under 20 parallel glue jobs but exceeding that (I tried max 35 parallel) I got intermittent errors like this:
Rate exceeded (Service: AWSGlue; Status Code: 400; Error Code:
ThrottlingException; Request ID: 0a350b23-2f75-4951-a643-20429799e8b5;
Proxy: null)
I’ve checked the service quotas documentation
https://docs.aws.amazon.com/general/latest/gr/glue.html and my account settings. 200 max should have handled my 35 parallel jobs happily.
There are no other Glue job scheduled to be run at the same time in my aws account.
Should I just blindly request to increase the quota and see it fixed or is there anything I can do to get around this?
2
Answers
Thanks to luk2302 and Robert for the suggestions. Based on their advice, I reach to a solution.
Add a retry in the Glue Task. (I tried IntervalSeconds 1 and BackoffRate 1 but that's too low and didn't work)
Hope this helps someone.
The quota that you are hitting is not the concurrent job quota of Glue, but the Start Job Run API quota. You basically requested too many job runs per second. If possible just wait in between every Start Job Run call.