Postgresql - List each message from table A with it's earliest response in table B, chronologically

Mitya
May 31, 2024
219 views
2 votes
3 Answers

I have three tables: conversations, user_messages and system_messages. Their basic structure (irrelevant columns removed for brevity) is:

create table conversations(id int generated by default as identity primary key);

create table user_messages(
   id int generated by default as identity primary key
  ,conversation_id int references conversations(id)
  ,content text
  ,created_at timestamp);

create table system_messages(
   id int generated by default as identity primary key
  ,conversation_id int references conversations(id)
  ,content text
  ,created_at timestamp);

I am trying to generate an export of "system said X" and "user responded with Y" rows for a given conversation. The issue I’m having is finding a way to ensure that the "user responded with Y" message corresponds to the chronologically next message in the conversation, following the system message. So output would look something like:

system_said	user_said
Hi!	Hello
How’s it going?	I’m alright, you?
I’m good!	Nice.

This works, but with a poor and inefficient query, using a sub-query that references outer values.

SELECT
    content AS system_said, (
        SELECT content
        FROM user_messages
        WHERE conversation_id = sm.conversation_id AND created_at > sm.created_at
        ORDER BY created_at
        LIMIT 1
    ) AS user_said
FROM system_messages sm
WHERE sm.conversation_id = ?

This is where I’m stuck. I’m trying to rewrite the above more efficiently using a proper join, but I can’t force the join to give me the chronologically next message, so I’m getting mixed up pairs like:

system_said user_said

Hi! Nice.

How’s it going? Hello

I’m good! I’m alright, you?

Here’s the query:
```
SELECT
    DISTINCT ON (sm) sm.id AS message,
    sm.content AS system_said,
    um.content AS user_said
FROM system_messages sm
JOIN (
    SELECT conversation_id, created_at, content
    FROM user_messages
    ORDER BY created_at
) um ON um.conversation_id = sm.conversation_id AND um.created_at > sm.created_at
ORDER BY sm
WHERE c.id = ?
```
(I found I had to add in DISTINCT ON () otherwise I got duplicate rows.)

system_said	user_said
Hi!	Nice.
How’s it going?	Hello
I’m good!	I’m alright, you?

Is there a way to this?

Answers

Use a CTE to rank user messages within each conversation based on their creation time and then use a JOIN to pair each system message with the next user message using the row number. Try this:

WITH ranked_user_messages AS (
    SELECT
        id,
        conversation_id,
        content,
        created_at,
        ROW_NUMBER() OVER (PARTITION BY conversation_id ORDER BY created_at) AS rn
    FROM user_messages
)
SELECT
    sm.content AS system_said,
    um.content AS user_said
FROM system_messages sm
LEFT JOIN LATERAL (
    SELECT content
    FROM user_messages um
    WHERE um.conversation_id = sm.conversation_id
      AND um.created_at > sm.created_at
    ORDER BY um.created_at
    LIMIT 1
) um ON true
WHERE sm.conversation_id = ?
ORDER BY sm.created_at;

I would try:

Schema (PostgreSQL v15)

create table system_messages (
  id serial primary key,
  conversation_id int,
  content text,
  created_at timestamp
);

create table user_messages (
  id serial primary key,
  conversation_id int,
  content text,
  created_at timestamp
);

insert into system_messages(conversation_id, content, created_at) values
   (1, 'Hi!', '2024-05-31 08:00:00'),
   (1, 'How''s it going?', '2024-05-31 08:01:00'),
   (1, 'I''m good!', '2024-05-31 08:02:00');

insert into user_messages(conversation_id, content, created_at) values
   (1, 'Hello', '2024-05-31 08:00:30'),
   (1, 'I''m alright, you?', '2024-05-31 08:01:30'),
   (1, 'Nice.', '2024-05-31 08:02:30');

Query #1

with s_m (r_n, system_said) as (
  select row_number() over (order by created_at),
         content
  from system_messages
  where conversation_id = 1
),
u_m (r_n, player_said) as (
  select row_number() over (order by created_at),
         content
  from user_messages
  where conversation_id = 1
)
select s_m.system_said, u_m.player_said from
s_m join u_m on s_m.r_n = u_m.r_n
order by s_m.r_n;

system_said	player_said
Hi!	Hello
How’s it going?	I’m alright, you?
I’m good!	Nice.

View on DB Fiddle

- Zegarek
- May 31, 2024 at 2:57 pm
- 0 votes
0
Your idea with distinct on sounds good, but there are a few things you could improve:
1. You don’t need the subquery, you can join directly.
2. Distinct on(sm) checks the entire record FROM system_messages sm. Since it has a primary key, distinct on(sm.id) is sufficient.
3. The order by you use with distinct on could use more than just the unique column. If you want the soonest response, add its timestamp: order by sm.id,um.created_at.
In a demo at db<>fiddle with 20k conversations and 200k messages in them, this takes 40ms:
```
SELECT
    DISTINCT ON (sm.id) sm.id AS message,
    sm.content AS system_said,
    um.content AS user_said
FROM system_messages sm
JOIN user_messages um 
   ON um.conversation_id = sm.conversation_id 
  AND um.created_at > sm.created_at
  AND sm.conversation_id = 1
ORDER BY sm.id,um.created_at;
```
You can also add covering indexes to speed things up with index-only scans, which push all this way below 1ms exec time:
```
create index on user_messages(conversation_id,created_at)
   include(content,id);
create index on system_messages(conversation_id,created_at)
   include(content,id);
```
With those in place, it looks like your initial idea wins performance wise, whether you push down the correlated scalar subquery down to be cross joined with, as a lateral suqbuery doing the exact same thing, or not: _{demo2 at db<>fiddle}
```
SELECT sm.content AS system_said,
       um.content AS user_said
FROM system_messages sm
CROSS JOIN LATERAL (
    SELECT content
    FROM   user_messages um
    WHERE um.conversation_id = sm.conversation_id 
      AND um.created_at > sm.created_at
    ORDER BY um.created_at
    LIMIT 1 ) um
WHERE sm.conversation_id = 999
ORDER BY sm.created_at;
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – List each message from table A with it's earliest response in table B, chronologically

Answers