skip to Main Content

I am using boto3 to work with redshift tables. I am running into limitations though regarding the boto3 ExecuteStatement operation. AWS seems to have some api limitations, I can’t seem to create large updates. I get the following exception:

Failed to query redshift for list of processed buckets, An error occurred (ValidationException) when calling the ExecuteStatement operation: Cannot process query string larger than 100kB

What is the best practice for making large updates? Am I using the best api call for the task at hand? Or may I need to adjust the updates to be done in batches? Does anyone have an example(s)?

2

Answers


  1. You should not be performing large updates on Redshift – it is the wrong type of relational database for such use cases.

    I’ve spent so long explaining the same set of basics, to so many people, that I’ve written up an introduction to Redshift, here;

    https://www.redshiftresearchproject.org/white_papers/downloads/introduction_to_the_fundamentals_of_amazon_redshift.html

    Login or Signup to reply.
  2. Having a SQL statement over 100K implies that you are passing the data for the update in the SQL statement itself. This is a bad practice in Redshift as this data is being processed through the leader node, compiled into the executable code for the compute nodes, and then sent to the compute nodes. This overloads the leader node and can slow the cluster down considerably.

    The data should be loaded directly to the compute nodes from S3 using the COPY command. This will require that your code creates an (several) S3 objects for the update data and then issue the SQL RS data API calls that uses this data to update the table.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search