skip to Main Content

I have a Postgres table that looks like this:

CREATE TABLE products(
   id string,
   attributes jsonb,
   PRIMARY KEY(id)
);

An example of attributes field could be like:

 {
      "endDate": [
        "2024-02-20T21:00:00.000Z"
      ],
      "countries": [
        "US","IT","ES"
      ],
      "type": [
        "RETAIL"
      ],
      "startDate": [
        "2024-02-13T08:00:00.000Z"
      ],
      "categories": [
        "ELECTRONICS"
      ],
      "currency": [
        "USD"
      ],
      "status": [
        "ACTIVE"
      ]
    }

I need to create filter where as an input I might get a list of countries e.g. ["US","MX"] for which each product having one of these countries would match criteria and/or where startDate might be after certain provided date.
I’m also interested in performance of such query, since this table might be really large.

I’ve tried sto filter products by countries using this query:

SELECT *
FROM products
WHERE
  (attributes @> '{ "countries": ["US","MX","JP"] }')

But this query would list only products which have all 3 provided countries. I need to have at least one country match, and in this example, it has "US" so it should be match.

2

Answers


  1. It would be trivial on a normalised structure, but until you find time and budget to change it, you can use jsonb JSONPath expressions with @@ predicate check operator: demo

    select * from products
    where attributes 
    @@ '  exists($.countries[*] 
                 ?(  @=="US"
                   ||@=="MX" ))
        && exists($.startDate?
                  (@>"2024-02-12T07:00:00.000Z"))'
    ;
    
    id attributes
    1 {"type": ["RETAIL"], "status": ["ACTIVE"], "endDate": ["2024-02-20T21:00:00.000Z"], "currency": ["USD"], "countries": ["US", "IT", "ES"], "startDate": ["2024-02-13T08:00:00.000Z"], "categories": ["ELECTRONICS"]}

    If you pair it with a GIN index:

    create index on products using gin(attributes jsonb_path_ops);
    

    You’ll see @@ speed up significantly thanks to index scans:

    Bitmap Heap Scan on public.products (cost=116.51..2432.07 rows=13325 width=396) (actual time=4.505..56.491 rows=9513 loops=1)
      Output: id, attributes
      Recheck Cond: (products.attributes @@ ‘(exists ($."countries"[*]?(@ == "US" || @ == "MX")) && exists ($."startDate"?(@ > "2024-02-12T07:00:00.000Z")))’::jsonpath)
      Rows Removed by Index Recheck: 28518
      Heap Blocks: exact=2149
      -> Bitmap Index Scan on products_attributes_idx (cost=0.00..113.18 rows=13325 width=0) (actual time=4.238..4.239 rows=38031 loops=1)
            Index Cond: (products.attributes @@ ‘(exists ($."countries"[*]?(@ == "US" || @ == "MX")) && exists ($."startDate"?(@ > "2024-02-12T07:00:00.000Z")))’::jsonpath)
    Planning Time: 0.374 ms
    Execution Time: 56.930 ms

    That’s a test on 40k random sample resembling yours. It’s also visible that not the entire JSONPath is supported by the index, hence the re-check:

    For these operators [@@, @?], a GIN index extracts clauses of the form accessors_chain = constant out of the jsonpath pattern, and does the index search based on the keys and values mentioned in these clauses. The accessors chain may include .key, [*], and [index] accessors. The jsonb_ops operator class also supports .* and .** accessors, but the jsonb_path_ops operator class does not.

    … which means date comparison with > won’t qualify, but your searches limited to specific data points should all fit the criteria.

    If you’re dealing with non-uniform timestamp formats, you might want to add a .datetime() method in there.

    Login or Signup to reply.
  2. You can use any of the following solutions:

    • Retrieve the JSON array and use array overlap comparison ?|
    SELECT *
    FROM products
    WHERE
      attributes -> 'countries' ?| array['US', 'MX', 'JP']
    
    • Use a JsonPath (you need to specify lots of conditions)
    SELECT *
    FROM products
    WHERE
      attributes @? '$.countries[*] ? (@ == "US" || @ == "MX" || @ == "JP")'
    
    • Use a subquery with an INTERSECT against the array.
    SELECT *
    FROM products
    WHERE EXISTS (
      SELECT value
      FROM jsonb_array_elements_text(attributes -> 'countries')
      INTERSECT
      SELECT *
      FROM UNNEST(array['US', 'MX', 'JP'])
    )
    

    db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search