skip to Main Content

Suppose the following:

create schema bv;
create table bv.user(id bigint primary key);
create table bv.user_photo (
  id bigint primary key,
  url varchar(255) not null,
  user_id bigint references bv.user(id)
);

insert into bv.user values (100), (101);
insert into bv.user_photo values
  (1, 'https://1.com', 100),
  (3, 'https://3.com', 100),
  (4, 'https://4.com', 101),
  (2, 'https://2.com', 100);

I’d like to query for and build an object for every user, and include only the latest image in the result.

Here’s what I have:

select
  json_build_object(
    'id', u.id,
    'latest_image', up.url
  ) user
from bv.user u
left join bv.user_photo up
  on u.id = up.user_id

However this returns,

[
  {"id" : 100, "url" : "https://2.com"},
  {"id" : 100, "url" : "https://3.com"},
  {"id" : 100, "url" : "https://1.com"},
  {"id" : 101, "url" : "https://4.com"}
]

However, the expected result is:

[
  {"id" : 100, "url" : "https://3.com"},
  {"id" : 101, "url" : "https://4.com"}
]

I’ve tried using distinct,

select distinct on(u.id)
  json_build_object(
    'id', u.id,
    'url', up.url
  ) user
from bv.user u
left join bv.user_photo up
  on u.id = up.user_id
order by u.id, up.id DESC

But my question is whether or not this is the correct approach? I feel like I shouldn’t be using distinct in such a situation.

2

Answers


  1. This is the correct approach and you by all means should be using distinct on in situations like this. Plain distinct – no, distinct on – yes:

    SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal.

    One thing I’m missing here is that you’re expecting a single array of those objects, but your query returns each object in a separate row. You need a json_agg() to collect them: demo

    select json_agg("user" order by id) as "user_array"
    from(select distinct on(u.id) u.id,
          json_build_object('id', u.id,
                            'url', up.url) as "user"
         from bv.user u 
         left join bv.user_photo up
                on u.id = up.user_id
         order by u.id, up.id DESC) subquery;
    
    user_array
    [ {"id" : 100, "url" : "https://3.com"}, {"id" : 101, "url" : "https://4.com"} ]
    Login or Signup to reply.
  2. With few photos per user, and while you return all (or most) users, DISTINCT ON is the best approach.
    But it’s typically faster to get distinct photos before you join:

    SELECT json_build_object('id', u.id, 'url', p.url) AS "user"
    FROM   users u
    LEFT   JOIN (
       SELECT DISTINCT ON (user_id)
              user_id, url
       FROM   user_photo
       ORDER  BY user_id, id DESC
       ) p ON p.user_id = u.id;
    

    For many photos per user, an emulated index-skip scan is (much) faster. See:

    To produce one summary array, you can skip json_build_object(). json_agg() can aggregate the row directly:

    SELECT json_agg(sub) AS "users"
    FROM  (
       SELECT u.id, p.url
       FROM   users u
       LEFT   JOIN (
          SELECT DISTINCT ON (user_id)
                 user_id, url
          FROM   user_photo
          ORDER  BY user_id, id DESC
          ) p ON p.user_id = u.id
       ) sub;
    

    fiddle

    Aside: Don’t use the reserved word "user" as identifier. It works while schema-qualified, but fails without.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search