I’m doing a simple COPY command that used to work:
echo " COPY table_name
FROM 's3://bucket/<date>/'
iam_role 'arn:aws:iam::123:role/copy-iam'
format as json 's3://bucket/jupath.json'
gzip ACCEPTINVCHARS ' ' TRUNCATECOLUMNS TRIMBLANKS MAXERROR 3;
" | psql
And now I get:
INFO: Load into table 'table_name' completed, 53465077 record(s) loaded successfully.
ERROR: deadlock detected
DETAIL: Process 26999 waits for AccessExclusiveLock on relation 3176337 of database 108036; blocked by process 26835.
Process 26835 waits for ShareLock on transaction 24230722; blocked by process 26999.
The only change is moving from dc2 instance type to ra3. Let me add this is the only command touches this table and there is only one process at a time.
2
Answers
Clearly a bug.
Table was cloned from one redshift to another by doing
SHOW TABLE table_name
, which provided:After removing the "noise" the command completed as usual without errors:
The key detail here is in the error message:
Relation 3176337, I assume, is the table in question – the target of the COPY. This should be confirmed by running something like:
I don’t expect any surprises here but it is good to check. If it is some different table (object) then this is important to know.
Now for the meat. You have 2 processes listed in the error message – PID 26999 and PID 26835. A process is a unique connection to the database or a session. So these are identifying the 2 connections to the database that have gotten locked with each other. So a good next step is to see what each of these sessions (processes or PIDs) are doing. Like this:
The thing you might run into is that these logging table "recycle" every few days so the data from this exact failure might be lost.
The next part of the error is about the open transaction that is preventing 26835 from moving forward. This transaction (identified by an XID, or transaction ID) is preventing 26835 progressing and is part of process 26999 but 26999 needs 26835 to complete some action before it a move – a deadlock. So seeing what is in this transaction will be helpful as well:
Again the data may have been lost due to time. I commented out the date range where clause of the last 2 queries to allow for looking back further in these tables. You should also be aware that PID and XID numbers are reused so check the date stamps on the results to be sure that that info from different sessions aren’t be combined. You may need a new where clause to focus in on just the event you care about.
Now you should have all the info you need to see why this deadlock is happening. Use the timestamps of the statements to see the order in which statements are being issued by each session (process). Remember that every transaction ends with a COMMIT (or ROLLBACK) and this will change the XID of the following statements in the session. A simple fix might be issuing a COMMIT in the "26999" process flow to close that transaction and let the other process advance. However, you need to understand if such a commit will cause other issues.
If you can find all this info and if you need any help reach out.