skip to Main Content

I have a table in postgressql with column that holds an json object:

create table test (
    rec_id int generated by default as identity,
    usr_id int unique,
    format text[],
    syn_data jsonb,
    generation_date timestamptz);

insert into test values 
( 2,
  144,
  '{audio,image,text}',
  '{ "7458": { "syn_idx": [724966,125940,727242],
            "score": 5.0,
            "custom_score": 1.0},
     "1742": { "syn_idx": [119294,119321],
            "score": 5.0,
            "custom_score": 1.0},
     "38521": { "syn_idx": [654145],
            "score": 5.0,
            "custom_score": 1.0},
     "4154": { "syn_idx": [617595,348300],
            "score": 5.0,
            "custom_score": 1.0}}',
  '2023-07-19 06:29:51.584859');

Keys in json are random and unordered.
I need to create a function which dynamically extracts first N records from syn_data column in a SQL query.
For each usr_id there are a few rows with this kind of json data.

How can i do it for each row?

I tried a json_each() function and a random over, but json_each() returns a random ordering of keys, which is not what I expected.

For example: when I give a 2 as parameter, I’d like to get first 2 records from syn_data column

{
  "7458": {
    "syn_idx": [724966, 125940, 727242],
    "score": 5.0,
    "custom_score": 1.0
  },
  "1742": {
    "syn_idx": [119294, 119321],
    "score": 5.0,
    "custom_score": 1.0
  }
}

I use PostgreSQL 14.1.

2

Answers


  1. The JSON in the syndata column is a dictionary. You’re looking of for the first few elements which have keys "1", "2", "3" and so on. You can generate those keys using generate_series, and then use the -> operator to retrieve them from the JSON column:

    select  syndata->(num::text)
    from    my_table
    cross join
            generate_series(1, 2) ser(num)
    

    Example at DB Fiddle

    Login or Signup to reply.
  2. If you want to cut out all syn_data entries outside a target range, you can use jsonb - text[] subtraction. Demo at db<>fiddle:

    select rec_id
          ,syn_data - translate(
                        jsonb_path_query_array( syn_data
                                               ,'$.keyvalue().key
                                                  ?(@.double()<$min ||
                                                    @.double()>$max)'
                                               ,jsonb_build_object('min',0,
                                                                   'max',2))::text
                        ,'[]'
                        ,'{}')::text[]
    from test;
    
    rec_id ?column?
    2 {"1": {"score": 5.0, "syn_idx": [724966, 125940, 727242], "custom_score": 1.0}, "2": {"score": 5.0, "syn_idx": [119294, 119321], "custom_score": 1.0}}

    This translate()s square brackets used in a jsonb array resulting from jsonb_path_query_array() into curly brackets that allow this to be cast to text[] which is usable with -. You can also extract keys, then aggregate into a text[], as shown in the demo.

    You could also remove all keys except the first accidental N entries (without "interpreting" their key) using OFFSET clause:

    select rec_id
          ,syn_data - remove_these
    from test, 
    lateral (select array_agg(remove_these::text) as remove_these
             from (select jsonb_object_keys(syn_data) as remove_these 
                   offset 2) as g1
            ) as g2;
    

    Or, the other way around, similar to what @Charlieface suggested, by extracting N (key,value) pairs using LIMIT and reconstructing a jsonb out of that with a jsonb_object_agg(k,v):

    select rec_id, jsonb_object_agg(k,v)
    from test, 
    lateral (select * from jsonb_each(syn_data) limit 2)_(k,v)
    group by rec_id;
    

    A regular where can replace the JSONPath filter expression while working in tandem with the limit whenever you need at most 2 out of a matching range:

    select rec_id, jsonb_object_agg(k,v)
    from test, 
    lateral (select * from jsonb_each(syn_data)_(k,v)
             where k::int between 0 and 2
             limit 2)
    group by rec_id;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search