Issue Summary:
Experiencing problems when transferring data from Oracle Cloud Infrastructure (OCI) to Databricks Delta Lake tables using Qlik Replicate.
Source: Oracle Cloud Infrastructure (OCI)
Destination: Databricks Delta Lake tables
Problem Statement:
Data duplicacy
Missing data
Details:
Logs from Qlik Replicate point to issues in Databricks.
Logs from Databricks, checked via "Query History," also show issues.
Unable to diagnose the exact errors causing data duplication and missing data.
Some network-related errors have been identified.
Uncertain about the next steps to mitigate the problem.
Sample error at Databricks side:
Error 1:
Query could not be scheduled: HTTP Response code: 500
Error 2:
[DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":1722250928325,"userId":"5954939804701401","userName":"[email protected]","operation":"MERGE","operationParameters":{"predicate":["((ITEM#22092922 = seg1_repcol#22092878) AND (LOC#22092925 = seg2_repcol#22092879))"],"matchedPredicates":[{"actionType":"update"}],"statsOnLoad":false,"notMatchedBySourcePredicates":[],"notMatchedPredicates":[]},"readVersion":2917,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numTargetRowsCopied":"0","numTargetRowsDeleted":"0","numTargetFilesAdded":"1","numTargetBytesAdded":"1282377","numTargetBytesRemoved":"1005294","numTargetDeletionVectorsAdded":"158","numTargetRowsMatchedUpdated":"28916","executionTimeMs":"209330","numTargetRowsInserted":"0","numTargetRowsMatchedDeleted":"0","numTargetDeletionVectorsUpdated":"158","scanTimeMs":"83197","numTargetRowsUpdated":"28916","numOutputRows":"28916","numTargetDeletionVectorsRemoved":"158","numTargetRowsNotMatchedBySourceUpdated":"0","numTargetChangeFilesAdded":"0","numSourceRows":"29161","numTargetFilesRemoved":"1","numTargetRowsNotMatchedBySourceDeleted":"0","rewriteTimeMs":"84773"},"tags":{"noRowsCopied":"true","delta.rowTracking.preserved":"false","restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/15.2.x-photon-scala2.12","txnId":"eb912069-75fd-4f33-9829-a9cb191b4b7e"}
Sample error at Qlik side pointing to Databricks:
Error1:
02728323: 2024-07-11T00:50:57 [TARGET_LOAD ]E: RetCode: SQL_ERROR SqlState: 08S01 NativeError: 124 Message: [Simba][Hardy] (124) A 503 response was returned but no Retry-After header was provided. Original error: Unknown [1022502] (ar_odbc_stmt.c:5090)
Error 2:
02602439: 2024-07-28T18:12:49 [TARGET_APPLY ]T: RetCode: SQL_ERROR SqlState: 08S01 NativeError: 115 Message: [Simba][Hardy] (115) Connection failed with error: SSL_read: Connection reset by peer [1022502] (ar_odbc_stmt.c:4737)
02602439: 2024-07-28T18:12:49 [TARGET_APPLY ]T: Network error encountered (ar_odbc_util.c:1242)
Error 3:
04171826: 2024-07-29T14:43:45 [TARGET_APPLY ]T: Failed (retcode -1) to execute statement
2
Answers
You can refer here about that exception.
Since, you haven’t done any partition the cause of error is addition of files by
INSERT
,DELETE
,UPDATE
, orMERGE
operations in an unpartitioned table.So, the possible solution is
WriteSerializable
which does not conflict with blind insert operation.So, try update these changes in your table.
have you managed to resolve the issue? We are experiencing the same with QR. I’d be happy to try help you to try and get more info on this issue.
Are there any other processes in Databricks editing the target tables? Is Autoscaling enabled on the cluster? Autoscaling for a fact causes the 503 response error. So make sure you disable that.
Disable autoCompact on the cluster, this is also known to cause the concurrentAppend conflict.
Have you noticed this occurring on the same tables over and over or is it random tables each time?
I hope this helps, please let me know whether you already resolved it or not because we’ve looked through all these options and ticked them off.
Cheers,
Joe