I want to select only those identifiers from a table which are contained in another subquery result and only top-2 rows must be used.
For example you can run the query on PG:
select a.rid
from(
values (41), (42), (43), (44)
) a(rid)
where a.rid in
(
select b.x
from(
values (43), (44), (42), (41)
) b(x)
order by b.x
fetch first 2 rows only
);
It should return
41
42
It works. However if I use the following query with CTE’s
select rid
from(
values (41), (42), (43), (44)
) s(rid)
where rid in
(
with recursive messages as (
select
rid as "root_id",
parent.id,
parent.sent_date
from
purchase_message parent
where
parent.id = rid
union
select
rid as "root_id",
child.id,
child.sent_date
from
purchase_message child
join messages on
child.parent_message_id = messages.id
),
ordered_root_ids as (
select
m."root_id",
max(m."sent_date") over (partition by m."root_id") as "maxSentDate"
from
messages m
order by "maxSentDate" desc
)
select "root_id" from ordered_root_ids
fetch first 2 rows only
)
it doesn’t work. I get all the results
41, 42, 43, 44
Why doesn’t it return only 2 results?
According to my expectations it must be only 2 results, and it must be 43 and 44 because they are the latest messages.
I don’t know how to provide all the related tables because there are too many. I hope they aren’t necessary. Does CTE or recursive CTE or nested query affect this? Or combination of all of them? How to modify the above query so that it sorts data in the subquery, takes 2 top rows and then they are used in the in
clause of the outer query?
2
Answers
Your query seems to be corrupted, I cant execute it in your edition
It looks like it would be like this
And it works fine
It would be better if you post your table data to understand what’s wrong
It doesn’t. There’s no way for
limit 2
andfetch first 2 rows only
to return more than just the two rows. Still, you’re getting 2 rows for each value that you’re checking, so it’s more than enough. This:Will run the parenthesized subexpression for each
rid
independently, like alateral
subquery, because you referenced the outerrid
inside there. As a result, for each value you check, you’re getting up to two matches. I’m guessing you expected to only get the 2 matches once, and that thein()
would compare each value to those two.You can move your inputs to a CTE, then reference all its
rid
s at once from your subquery, have them all descend down the hierarchy, get sorted together then filtered down together to the top 2. Demo at db<>fiddle:I also had to change your logic that selects the top two
root_id
based on the most recentsent_date
in their hierarchy. Once it’s allowed to operate on all inputrid
‘s, your code would keep picking 2 of the samerid
:because of your use of the window function, you’d get all rows of each hierarchy, each repeating their
max(sent_date)
. As long as the hierarchy with the most recentsent_date
has 2 or more rows in it, you will get those two, because they both have the samemax(sent_date)
and the samerid
.