I have the following JSON structure and have it parsing into 3 tables (appointment, dock, warehouse). When all the JSON data is valid all records can be parsed at once very quickly. When there is a data issue (with just 1 record) then none of the records get parsed as the process is within a transaction and the transaction gets rolled back. I’m using OPENJSON to insert into each of the tables.
Here’s the appointment table. The other 2 are very similar or using an insert.
UPDATE a
SET a.start = ss.start,
FROM dbo.Appointment a
CROSS APPLY
OPENJSON(a.payload, '$')
WITH (id UNIQUEIDENTIFIER,
start DATETIME) ss
INSERT dbo.Appointment_Dock
(
...
)
SELECT ...
FROM dbo.Appointment a
CROSS APPLY
OPENJSON(a.payload, '$.dock')
WITH (id UNIQUEIDENTIFIER,
...) ss
I can use another process to process 1 record at a time and keep the valid records and flag the bad data record. The issue is processing 1 record at a time is much slower.
The current error being raise is related to converting incoming data to the appropriate type. I have manually set the bad data value for testing purposes.
Conversion failed when converting the nvarchar value ‘error’ to data type datetime.
{
"data": [
{
"id": "4940b7b9-7b30-42e7-8a0f-63f4095c75f4",
"start": "2023-11-17T20:00:00.000Z",
"dock": {
"id": "ca009877-6559-4fb2-afa7-d8c34e9cd819",
"name": "Door# 1",
"warehouseId": "30e9ad49-4faf-4d40-805c-9dc75e67d873",
"warehouse": {
"id": "30e9ad49-4faf-4d40-805c-9dc75e67d873",
"name": "Warehouse A"
}
}
},
{
"id": "4940b7b9-7b30-42e7-8a0f-63f4095c75f4",
"start": "2023-11-18T20:00:00.000Z",
"dock": {
"id": "ca009877-6559-4fb2-afa7-d8c34e9cd819",
"name": "Door# 1",
"warehouseId": "30e9ad49-4faf-4d40-805c-9dc75e67d873",
"warehouse": {
"id": "30e9ad49-4faf-4d40-805c-9dc75e67d873",
"name": "Warehouse A"
}
}
}
],
"entity": "Appointment",
"action": "read"
}
Is there a way to code in SQL that the JSON records can be processed in a batch and handle any errors? Handling errors would mean the valid records get their values stored and the problem records would just get a field set with the error message.
I could write all the logic in C#, but was thinking it would be better/quicker to not transfer all the data over the network just to parse it into other tables.
Would this be better served with CLR?
Other approaches?
2
Answers
The typical strategy here is pre-processing the data to clean it so it can be reliably passed to the database in bulk. This might be a C# or application or Powershell script, but it might also be something like SQL Server Integration Services.
Since you’re already ingesting json data for pre-processing, it’s also common for that step to output the data in a format that more natural for SQL Server, especially SQLBulkCopy or BULK INSERT features. Using these will typically more than make up for any performance lost from the extra data processing step.
If you must do it in SQL, and it has a standard structure, I’d suggest first getting it all into a temporary table as nvarchars, then extract the data from there using TRY_CAST and/or TRY_CONVERT to get the data you need. The advantage of this approach is that it is much easier to deal with data in an table (even if it is temporary) rather than JSON.
For example, assuming your data is in Appointment table with Appointment_ID and payload, you could use the following to get your initial data set:
Then use this temporary table to do the three sets of updates.
In production, I’d usually first create the temp table with two columns for each – an ‘extract’ column and the final one. I’d also have a column called ‘Error_flag’ that you set to 1 any time you find an error. That way, you can save the errors in a log for later review.
Below is an example table structure for the temp table.