skip to Main Content

Suppose the following:

create table my_schema.user(
  id serial primary key
);
create table my_schema.conversation(
  id serial primary key
);
create table my_schema.conversation_message(
  id serial primary key,
  conversation_id bigint references my_schema.conversation(id) not null,
  from_user_id bigint references my_schema.user(id) not null,
  to_user_id bigint references my_schema.user(id) not null,
  created_at timestamp not null
);
insert into my_schema.user values (1), (2), (3);
insert into my_schema.conversation values (50), (51);
insert into my_schema.conversation_message values
  (100, 50, 1, 3, current_timestamp(0)),
  (101, 51, 1, 2, current_timestamp(0)),
  (102, 51, 2, 1, current_timestamp(0)),
  (103, 51, 2, 1, current_timestamp(0)),
  (104, 51, 1, 2, current_timestamp(0));

I need the following result:

listing_conversation_id | latest_message
------------------------+------------+-----------------
                     50 | {"id":100,"created_at":"2023-04-13T19:23:06","to_user_id":3,"from_user_id":1,"conversation_id":50}
                     51 | {"id":104,"created_at":"2023-04-13T19:23:06","to_user_id":2,"from_user_id":1,"conversation_id":51}

I’ve been able to retrieve all conversation_messages linked to each conversation with the following query:

select
  c.id as listing_conversation_id,
  to_jsonb(array_agg(cm.*)) as messages
  -- to_json(
  --   select *
  --   from my_schema.conversation_message
  --   limit 1
  -- ) as latest_message
from my_schema.conversation c
join my_schema.conversation_message cm
  on cm.conversation_id = c.id
join my_schema.user u1
  on u1.id = cm.from_user_id
join my_schema.user u2
  on u2.id = cm.to_user_id
where u1.id = 1
or u2.id = 1
group by c.id;

But latest_message obviously can’t be an array.

How can I get this to work? I’m catching syntax errors whenever I try to nest queries, as I’ve tried with above in the commented out part of the query. Also, is this approach fine or should I consider a different model?

Fiddle

2

Answers


  1. Use DISTINCT ON to get the last message per conversation – the ordering is defined in the order by clause.

    Query (leaving the not relevant tables out)

    with latest_message as ( 
    select distinct on (conversation_id) * from  conversation_message order by conversation_id, created_at desc
    )
    select latest_message.conversation_id listing_conversation_id, to_jsonb(latest_message.*)
    from latest_message;
    
    listing_conversation_id|to_jsonb                                                                                                   |
    -----------------------+-----------------------------------------------------------------------------------------------------------+
                         50|{"id": 100, "created_at": "2023-04-13T21:15:08", "to_user_id": 3, "from_user_id": 1, "conversation_id": 50}|
                         51|{"id": 101, "created_at": "2023-04-13T21:15:08", "to_user_id": 2, "from_user_id": 1, "conversation_id": 51}|
    
    Login or Signup to reply.
  2. Assuming that conversation_message.id is being used as a proxy for recency, the following gives the specified result (this was for the post as originally titled, which specified the "latest message between two users"):

    SELECT
      t.*
    FROM (SELECT DISTINCT ON (least(u1.id, u2.id), greatest(u1.id, u2.id))
                 c.id AS listing_conversation_id,
                 to_jsonb(cm.*) AS latest_message
            FROM
              my_schema.conversation c
              JOIN my_schema.conversation_message cm ON cm.conversation_id = c.id
              JOIN my_schema.user u1 ON u1.id = cm.from_user_id
              JOIN my_schema.user u2 ON u2.id = cm.to_user_id
           WHERE
             u1.id = 1 OR u2.id = 1
           ORDER BY
             least(u1.id, u2.id),
             greatest(u1.id, u2.id),
             cm.id DESC) t
    ORDER BY
      t.listing_conversation_id;
    

    (As pointed out by Marmite Bomber, the joins to my_schema.user and conversation are unnecessary. They were left on the assumption that the OP had included them because other columns from those tables are to be included in the final query.)

    DISTINCT ON (least(u1.id, u2.id), greatest(u1.id, u2.id)) is necessary because the original version of the question specified pairs of users, not conversations, and there could be more than one conversation between the same two users. The outer query sorts by listing_conversation_id to insure the results are in the order matching the OP’s example.

    In general, generated ID columns should carry no meaning beyond identifying a specific row. Although it might be unlikely to occur over the lifetime of an application, it’s possible for sequences to cycle, such that more recently created rows could have IDs less than older rows. Using ID values as proxies for time also creates issues if there is ever a need to merge data from another source. For conversation_message, created_at would be a better choice for determining recency.

    Related to that last point, instead of using current_timestamp in the setup script, clock_timestamp() could be used so that each message would have a different value for created_at.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search