skip to Main Content

I have a deeply nested json array. The array contains a list of json objects, I want to remove any of the objects that are matched based on a field within the object.

{
  "name": "John smith",
  "items": {
    "unknown-key-1": {
      "file-array": [
        {
          "file-id": "file-1"
        },
        {
          "file-id": "file-2"
        },
        {
          "file-id": "file-3"
        }
      ]
    },
    "unknown-key-2": {
      "file-array": [
        {
          "file-id": "file-1"
        },
        {
          "file-id": "file-2"
        }
      ]
    }
  }
}

So for example, say I want to remove all objects where the file-id is "file-1". My UPDATE statement would remove all jsonb objects matching that file-id. After the UPDATE my jsonb column would look like this:

{
  "name": "John smith",
  "items": {
    "unknown-key-1": {
      "file-array": [
        {
          "file-id": "file-2"
        },
        {
          "file-id": "file-3"
        }
      ]
    },
    "unknown-key-2": {
      "file-array": [
        {
          "file-id": "file-2"
        }
      ]
    }
  }
}

I can achieve this when the array is at the top level of the json, or, when the array is nested inside an object and the key is known. But in this case the keys are dynamically generated (i.e "unknown-key-1", "unknown-key-2")

I understand that under proper circumstances I would normalise the data as this is an anti-pattern, however I have no choice. Also I want to do this using an UPDATE statement instead of a Postgres function.

2

Answers


  1. Modifying the jsonb isn’t as tricky as applying that as an update:

    select jdata #- array['items',item,'file-array',file_index-1]::text[]
    from test
        ,jsonb_each(jdata->'items')a(item,contents)
        ,jsonb_array_elements(contents->'file-array')with ordinality b(file_obj,file_index)
    where (file_obj->>'file-id')='file-1';
    

    for each row it lists its items, and for each of those, it lists what’s under file-array. This produces the item names and file indices that can be filtered in where and used to subtract the path using #-. Indexes resulting from with ordinality are 1-based, while jsonb arrays are 0-based, hence the -1.

    The problem with using that directly as an update batch is that it results in single, separate, atomic changes based on each file to be removed from each value, and each row would apply only one of those small changes. In your example, only the file under unknown-key-2 or unknown-key-1 would get the file-1 removed, not both. See the notes under update doc:

    When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn’t join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.

    You’d have to either keep running the same update until you no longer see rows being affected, or you need to squash the updates so that each row gets only one cumulative change to apply: demo at db<>fiddle

    with recursive required_changes as (
        select id,
          jdata,
          item,
          file_index-1 as file_index,
          row_number()over(partition by id) as update_round
        from test
            ,jsonb_each(jdata->'items')a(item,contents)
            ,jsonb_array_elements(contents->'file-array')
             with ordinality b(file_obj,file_index)
        where (file_obj->>'file-id')='file-1')
    ,iteratively_merged_changes as (
        select id,
             jdata#-array['items',item,'file-array',file_index]::text[] jdata,
             2 as next_round
        from required_changes where update_round=1
        union
        select a.id,
             a.jdata#-array['items',item,'file-array',file_index]::text[],
             a.next_round+1
        from iteratively_merged_changes a join required_changes b 
        on a.id=b.id 
        and b.update_round=a.next_round)
    ,final_batch as (
        select distinct on(id)id,jdata 
        from iteratively_merged_changes order by id,next_round desc)
    update test t set jdata=f.jdata
    from final_batch f where t.id=f.id
    returning t.*;
    

    The first CTE finds what path needs to be removed from which row, second one keeps applying those changes iteratively, one on top of another and the third one with distinct on just passes the final value from the latest round with all changes applied, to the outer update.

    WITH needs to be RECURSIVE for the second one to be able to self-reference, but only that one does it. More tests

    Login or Signup to reply.
  2. @Zegarek’s answer is useful when you know exactly what specific key names you need to look for in your entire JSON object, and also when you know the general structure and location of the target objects. However, in general cases, you will need to be able to handle any key names and any object structure and composition. The query below first finds the paths to every item in the JSON, and then uses those paths to search for the objects targeted for removal and then performs the removal in a recursive cte

    with recursive cte(id, obj, arr, path) as (
       -- get all paths to every item in the object, regardless of key name and structure
       select t.id, case when jsonb_typeof(t.js) = 'object' then t.js else '{"null":null}'::jsonb end, 
          case when jsonb_typeof(t.js) = 'array' then t.js else '[null]'::jsonb end, ''
       from tbl t
       union all
       select c.id, case when jsonb_typeof(p_obj.value) = 'object' then p_obj.value 
          when jsonb_typeof(p_arr.value) = 'object' then p_arr.value else 
             (case when jsonb_typeof(p_obj.value) in ('array', 'object') or  
                 jsonb_typeof(p_arr.value) in ('array', 'object') then '{"null":null}'::jsonb else '{}'::jsonb end) end,
          case when jsonb_typeof(p_obj.value) = 'array' then p_obj.value 
            when jsonb_typeof(p_arr.value) = 'array' then p_arr.value else 
             (case when jsonb_typeof(p_obj.value) in ('array', 'object') or  
                 jsonb_typeof(p_arr.value) in ('array', 'object') then '[null]'::jsonb else '[]'::jsonb end) end,
          c.path || case when c.path != '' then ',' else '' end || case when p_arr.value#>>'{}' is not null then cast(p_arr.r as text) else p_obj.key end
       from cte c cross join jsonb_each(c.obj) p_obj cross join lateral (
          select row_number() over (order by 1) - 1 r, k.value 
          from jsonb_array_elements(c.arr) k) p_arr 
    ),
    paths as (
       -- find the paths generated in the cte above that contain a key 'file-id' with a corresponding value of 'file-1' 
       select c.id, ('{'||c.path||'}')::text[] path from cte c join tbl t on t.id = c.id 
       where ((t.js#>('{'||c.path||'}')::text[]) -> 'file-id')#>>'{}' = 'file-1' 
    ),
    load_update as (
       select row_number() over (partition by p.id) r, p.id, t.js, p.path from paths p join tbl t on p.id = t.id
    ),
    perform_update as (
       -- make the updates in-place
       select u.r, u.id, u.js #- u.path js from load_update u where u.r = 1
       union all
       select u.r, u.id, p.js #- u.path js 
       from perform_update p join load_update u on u.r = p.r + 1 and p.id = u.id
    )
    update tbl set js = p.js from perform_update p 
    where tbl.id = p.id and p.r = (select max(p1.r) from load_update p1 where p1.id = p.id)
    

    See fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search