Suppose the following:
create table my_schema.user(
id serial primary key
);
create table my_schema.conversation(
id serial primary key
);
create table my_schema.conversation_message(
id serial primary key,
conversation_id bigint references my_schema.conversation(id) not null,
from_user_id bigint references my_schema.user(id) not null,
to_user_id bigint references my_schema.user(id) not null,
created_at timestamp not null
);
insert into my_schema.user values (1), (2), (3);
insert into my_schema.conversation values (50), (51);
insert into my_schema.conversation_message values
(100, 50, 1, 3, current_timestamp(0)),
(101, 51, 1, 2, current_timestamp(0)),
(102, 51, 2, 1, current_timestamp(0)),
(103, 51, 2, 1, current_timestamp(0)),
(104, 51, 1, 2, current_timestamp(0));
I need the following result:
listing_conversation_id | latest_message
------------------------+------------+-----------------
50 | {"id":100,"created_at":"2023-04-13T19:23:06","to_user_id":3,"from_user_id":1,"conversation_id":50}
51 | {"id":104,"created_at":"2023-04-13T19:23:06","to_user_id":2,"from_user_id":1,"conversation_id":51}
I’ve been able to retrieve all conversation_message
s linked to each conversation
with the following query:
select
c.id as listing_conversation_id,
to_jsonb(array_agg(cm.*)) as messages
-- to_json(
-- select *
-- from my_schema.conversation_message
-- limit 1
-- ) as latest_message
from my_schema.conversation c
join my_schema.conversation_message cm
on cm.conversation_id = c.id
join my_schema.user u1
on u1.id = cm.from_user_id
join my_schema.user u2
on u2.id = cm.to_user_id
where u1.id = 1
or u2.id = 1
group by c.id;
But latest_message
obviously can’t be an array.
How can I get this to work? I’m catching syntax errors whenever I try to nest queries, as I’ve tried with above in the commented out part of the query. Also, is this approach fine or should I consider a different model?
2
Answers
Use
DISTINCT ON
to get the last message per conversation – the ordering is defined in theorder by
clause.Query (leaving the not relevant tables out)
Assuming that
conversation_message.id
is being used as a proxy for recency, the following gives the specified result (this was for the post as originally titled, which specified the "latest message between two users"):(As pointed out by Marmite Bomber, the joins to
my_schema.user
andconversation
are unnecessary. They were left on the assumption that the OP had included them because other columns from those tables are to be included in the final query.)DISTINCT ON (least(u1.id, u2.id), greatest(u1.id, u2.id))
is necessary because the original version of the question specified pairs of users, not conversations, and there could be more than one conversation between the same two users. The outer query sorts bylisting_conversation_id
to insure the results are in the order matching the OP’s example.In general, generated ID columns should carry no meaning beyond identifying a specific row. Although it might be unlikely to occur over the lifetime of an application, it’s possible for sequences to cycle, such that more recently created rows could have IDs less than older rows. Using ID values as proxies for time also creates issues if there is ever a need to merge data from another source. For
conversation_message
,created_at
would be a better choice for determining recency.Related to that last point, instead of using
current_timestamp
in the setup script,clock_timestamp()
could be used so that each message would have a different value forcreated_at
.