I have a deeply nested json array. The array contains a list of json objects, I want to remove any of the objects that are matched based on a field within the object.
{
"name": "John smith",
"items": {
"unknown-key-1": {
"file-array": [
{
"file-id": "file-1"
},
{
"file-id": "file-2"
},
{
"file-id": "file-3"
}
]
},
"unknown-key-2": {
"file-array": [
{
"file-id": "file-1"
},
{
"file-id": "file-2"
}
]
}
}
}
So for example, say I want to remove all objects where the file-id
is "file-1". My UPDATE statement would remove all jsonb objects matching that file-id
. After the UPDATE my jsonb column would look like this:
{
"name": "John smith",
"items": {
"unknown-key-1": {
"file-array": [
{
"file-id": "file-2"
},
{
"file-id": "file-3"
}
]
},
"unknown-key-2": {
"file-array": [
{
"file-id": "file-2"
}
]
}
}
}
I can achieve this when the array is at the top level of the json, or, when the array is nested inside an object and the key is known. But in this case the keys are dynamically generated (i.e "unknown-key-1", "unknown-key-2")
I understand that under proper circumstances I would normalise the data as this is an anti-pattern, however I have no choice. Also I want to do this using an UPDATE statement instead of a Postgres function.
2
Answers
Modifying the
jsonb
isn’t as tricky as applying that as an update:for each row it lists its
items
, and for each of those, it lists what’s underfile-array
. This produces the item names and file indices that can be filtered inwhere
and used to subtract the path using#-
. Indexes resulting fromwith ordinality
are 1-based, whilejsonb
arrays are 0-based, hence the-1
.The problem with using that directly as an update batch is that it results in single, separate, atomic changes based on each file to be removed from each value, and each row would apply only one of those small changes. In your example, only the file under
unknown-key-2
orunknown-key-1
would get thefile-1
removed, not both. See the notes underupdate
doc:You’d have to either keep running the same update until you no longer see rows being affected, or you need to squash the updates so that each row gets only one cumulative change to apply: demo at db<>fiddle
The first CTE finds what path needs to be removed from which row, second one keeps applying those changes iteratively, one on top of another and the third one with
distinct on
just passes the final value from the latest round with all changes applied, to the outerupdate
.WITH
needs to beRECURSIVE
for the second one to be able to self-reference, but only that one does it. More tests@Zegarek’s answer is useful when you know exactly what specific key names you need to look for in your entire JSON object, and also when you know the general structure and location of the target objects. However, in general cases, you will need to be able to handle any key names and any object structure and composition. The query below first finds the paths to every item in the JSON, and then uses those paths to search for the objects targeted for removal and then performs the removal in a recursive
cte
See fiddle