skip to Main Content

So I work with Postgres SQL, and I have a jsonb column with the following structure:

{
   "Store1":[
      {
         "price":5.99,
         "seller":"seller"
      },
      {
         "price":56.43,
         "seller":"seller"
      }
   ],
   "Store2":[
      {
         "price":45.65,
         "seller":"seller"
      },
      {
         "price":44.66,
         "seller":"seller"
      }
   ]
}

I have a jsonb like this for every product in the database. I want to run an SQL query that will answer the following question:
For each product, is one of the prices in this JSON is bigger/equal/smaller than X?

Basically filter the product to include only the ones who have at least one price that satisfies a mathematical condition.
How can I do it efficiently? What’s the best way in Postgres to iterate a JSON like this, with a relatively complex inner structure?

Also, if I could control the way the data is structured (to an extent, I can), what changes can I do to make this query more efficient?

Thanks!

2

Answers


  1. Use a json path expression:

    WHERE col @@ '$.*[*].price < 20'
    

    or

    WHERE col @? '$.*[*] ? (@.price < 20)'
    

    If you need to compare to another column or make the query parameterised, you can either build the jsonpath dynamically

    WHERE col @@ format('$.*[*].price < %s', $1)::jsonpath
    WHERE col @? format('$.*[*] ? (@.price < %s)', $1)::jsonpath
    

    or you can use the respective function and pass variables as an object:

    WHERE jsonb_path_match(col, '$.*[*].price < $limit', jsonb_build_object('limit', $1))
    WHERE jsonb_path_exists(col, format('$.*[*] ? (@.price < $limit)', jsonb_build_object('limit', $1))
    

    I admit I had to check my cheat sheet to figure out the right combination of operator and expression. Takeaways:

    • if a comparison operator needs to work with multiple values, it generally functions as an ANY
    • @@ does not work with ? (@ …) filter expressions since they don’t return a boolean,
      @? does not work with predicates since they always return a value (even if it’s false)
    Login or Signup to reply.
  2. What changes can I do to make this query more efficient?

    As @jjanes commented on my other answer, the jsonpath match col @@ '$.*[*].price < $limit' isn’t going to be fast and needs to do full table scan, at least for < and >. To make a useful index, a different approach is required. An index can only have a single value to compare with, not any number. For that, we need to change the condition from EXISTS(SELECT prices_of(col) WHERE price < $limit) to (SELECT MIN(prices_of(col))) < $limit.

    With this idea it is possible to build an expression index on the result of a custom immutable function:

    CREATE FUNCTION min_price(data jsonb) RETURNS float
      LANGUAGE SQL
      IMMUTABLE
      RETURNS NULL ON NULL INPUT
      RETURN (
        SELECT min((offer ->> 'price')::float)
        FROM jsonb_each(data) AS entries(name, store),
        LATERAL jsonb_array_elements(store) AS elements(offer)
      );
    
    CREATE INDEX example_min_data_price_idx ON example (min_price(data));
    

    which you can use as

    SELECT * FROM example WHERE min_price(data) < 20;
    

    Looking for rows with a price larger than a certain number requires a separate index on max_price(data). If you want to use the index in a JOIN with more conditions, consider making it a multi-column index.

    Looking for row with a price equalling a certain number can be optimised by indexing the jsonb column and using a jsonpath:

    CREATE INDEX example_data_idx ON example USING GIN (data jsonb_ops);
    SELECT * FROM example WHERE data @@ '$.*[*].price == 20';
    SELECT * FROM example WHERE data @? '$.*[*] ? (@.price == 20)';
    

    Unfortunately you can’t use jsonb_path_ops here since that doesn’t support the wildcard.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search