Extract first N values from a json field in PostgreSQL

krulon
January 3, 2024
94 views
0 votes
2 Answers

I have a table in postgressql with column that holds an json object:

create table test (
    rec_id int generated by default as identity,
    usr_id int unique,
    format text[],
    syn_data jsonb,
    generation_date timestamptz);

insert into test values 
( 2,
  144,
  '{audio,image,text}',
  '{ "7458": { "syn_idx": [724966,125940,727242],
            "score": 5.0,
            "custom_score": 1.0},
     "1742": { "syn_idx": [119294,119321],
            "score": 5.0,
            "custom_score": 1.0},
     "38521": { "syn_idx": [654145],
            "score": 5.0,
            "custom_score": 1.0},
     "4154": { "syn_idx": [617595,348300],
            "score": 5.0,
            "custom_score": 1.0}}',
  '2023-07-19 06:29:51.584859');

Keys in json are random and unordered.
I need to create a function which dynamically extracts first N records from syn_data column in a SQL query.
For each usr_id there are a few rows with this kind of json data.

How can i do it for each row?

I tried a json_each() function and a random over, but json_each() returns a random ordering of keys, which is not what I expected.

For example: when I give a 2 as parameter, I’d like to get first 2 records from syn_data column

{
  "7458": {
    "syn_idx": [724966, 125940, 727242],
    "score": 5.0,
    "custom_score": 1.0
  },
  "1742": {
    "syn_idx": [119294, 119321],
    "score": 5.0,
    "custom_score": 1.0
  }
}

I use PostgreSQL 14.1.

Answers

- Andomar
- January 3, 2024 at 12:22 pm
- 0 votes
0
The JSON in the syndata column is a dictionary. You’re looking of for the first few elements which have keys "1", "2", "3" and so on. You can generate those keys using generate_series, and then use the -> operator to retrieve them from the JSON column:
```
select  syndata->(num::text)
from    my_table
cross join
        generate_series(1, 2) ser(num)
```
Example at DB Fiddle
Login or Signup to reply.

- Zegarek
- January 3, 2024 at 1:00 pm
- 0 votes
0
If you want to cut out all syn_data entries outside a target range, you can use jsonb - text[] subtraction. Demo at db<>fiddle:
```
select rec_id
      ,syn_data - translate(
                    jsonb_path_query_array( syn_data
                                           ,'$.keyvalue().key
                                              ?(@.double()<$min ||
                                                @.double()>$max)'
                                           ,jsonb_build_object('min',0,
                                                               'max',2))::text
                    ,'[]'
                    ,'{}')::text[]
from test;
```
rec_id ?column?

2 {"1": {"score": 5.0, "syn_idx": [724966, 125940, 727242], "custom_score": 1.0}, "2": {"score": 5.0, "syn_idx": [119294, 119321], "custom_score": 1.0}}

This translate()s square brackets used in a jsonb array resulting from jsonb_path_query_array() into curly brackets that allow this to be cast to text[] which is usable with -. You can also extract keys, then aggregate into a text[], as shown in the demo.

You could also remove all keys except the first accidental N entries (without "interpreting" their key) using OFFSET clause:
```
select rec_id
      ,syn_data - remove_these
from test, 
lateral (select array_agg(remove_these::text) as remove_these
         from (select jsonb_object_keys(syn_data) as remove_these 
               offset 2) as g1
        ) as g2;
```
Or, the other way around, similar to what @Charlieface suggested, by extracting N (key,value) pairs using LIMIT and reconstructing a jsonb out of that with a jsonb_object_agg(k,v):
```
select rec_id, jsonb_object_agg(k,v)
from test, 
lateral (select * from jsonb_each(syn_data) limit 2)_(k,v)
group by rec_id;
```
A regular where can replace the JSONPath filter expression while working in tandem with the limit whenever you need at most 2 out of a matching range:
```
select rec_id, jsonb_object_agg(k,v)
from test, 
lateral (select * from jsonb_each(syn_data)_(k,v)
         where k::int between 0 and 2
         limit 2)
group by rec_id;
```
Login or Signup to reply.

rec_id	?column?
2	{"1": {"score": 5.0, "syn_idx": [724966, 125940, 727242], "custom_score": 1.0}, "2": {"score": 5.0, "syn_idx": [119294, 119321], "custom_score": 1.0}}

Please signup or login to give your own answer.

Click here to cancel reply.