I have to tables that I want to join, order by two timestamps and get as result the distinct values (for several columns). But it doesn’t work.
See examples below:
CREATE TABLE t1(myid int, myyear int, mycol int, mdate timestamp);
INSERT INTO t1 VALUES
(11833,2022,1059,'2022-11-03 22:02:00'),(11834,2022,1059,'2022-11-17 19:56:41'),(11832,2021,1058,'2021-11-16 16:38:21'),(11839,2021,1057,'2021-11-10 18:08:09'),(11847,2021,1055,'2022-05-31 12:13:11'),(11847,2021,1055,'2022-05-31 12:13:11'),(11850,2021,1049,'2021-09-29 16:11:31'),(11853,2021,1046,'2022-01-24 11:44:41'),(11855,2021,1045,'2022-01-24 11:38:05'),(11865,2021,1044,'2022-01-24 11:23:51'),(11856,2021,1043,'2022-01-24 11:00:24'),(11840,2021,1042,'2021-11-30 12:28:13'),(11831,2021,1042,'2021-11-30 12:22:30'),(11846,2022,1042,'2022-11-02 15:06:00'),(11829,2022,1036,'2022-11-02 02:37:00'),(11826,2021,1035,'2021-09-24 13:07:48'),(11825,2021,1034,'2021-10-06 08:22:23'),(11830,2022,1033,'2022-11-03 21:18:00'),(11827,2022,1033,'2022-11-15 21:46:04'),(11828,2022,1032,'2022-11-08 16:44:08'),(11824,2022,1031,'2022-10-25 18:09:03'),(11823,2022,1031,'2022-11-02 03:10:00'),(11822,2022,1030,'2022-10-24 14:59:25')
;
CREATE TABLE t2(myid int, name varchar,idate timestamp);
INSERT INTO t2 VALUES
(11833,'Name1684','2023-01-10 15:52:55'),(11834,'Name1727','2023-01-10 15:52:55'),(11832,'Name609','2023-01-10 15:52:54'),(11839,'Name608','2023-01-10 15:52:59'),(11847,'Name606','2023-01-10 15:53:03'),(11847,'Name607','2023-01-10 15:53:03'),(11850,'Name605','2023-01-10 15:53:04'),(11853,'Name604','2023-01-10 15:53:05'),(11855,'Name603','2023-01-10 15:53:06'),(11865,'Name602','2023-01-10 15:53:10'),(11856,'Name601','2023-01-10 15:53:07'),(11840,'Name600','2023-01-10 15:52:59'),(11831,'Name1726','2023-01-10 15:52:53'),(11846,'Name1683','2023-01-10 15:53:03'),(11829,'Name1682','2023-01-10 15:52:52'),(11826,'Name599','2023-01-10 15:52:50'),(11825,'Name598','2023-01-10 15:52:49'),(11830,'Name1681','2023-01-10 15:52:52'),(11827,'Name1725','2023-01-10 15:52:51'),(11828,'Name1680','2023-01-10 15:52:51'),(11824,'Name1678','2023-01-10 15:52:48'),(11823,'Name1679','2023-01-10 15:52:48'),(11822,'Name1677','2023-01-10 15:52:47')
;
Show example which is not working before order and distinct:
Select
*
from t1
join t2
on t1.myid=t2.myid where t1.mycol =1059
=> Gives me this result:
myid | myyear | mycol | mdate | myid | name | idate |
---|---|---|---|---|---|---|
11833 | 2022 | 1059 | 2022-11-03 22:02:00 | 11833 | Name1684 | 2023-01-10 15:52:55 |
11834 | 2022 | 1059 | 2022-11-17 19:56:41 | 11834 | Name1727 | 2023-01-10 15:52:55 |
I want to order first by column mdate, then by idate (both to see the youngest dates) and then see only distinct values of (myyear and mycol)
CREATE TABLE expectedresult(myid int, myyear int,mycol int, mdate timestamp,name varchar,idate timestamp);
INSERT INTO expectedresult VALUES
(11834,2022,1059,'2022-11-17 19:56:41','Name1727','2023-01-10 15:52:55')
myid | myyear | mycol | mdate | name | idate |
---|---|---|---|---|---|
11834 | 2022 | 1059 | 2022-11-17 19:56:41 | Name1727 | 2023-01-10 15:52:55 |
This is what I have tried:
create table t3 as(
select distinct on (subq1.myyear,subq1.mycol)
*
from(
Select
t1.myid,
t1.myyear,
t1.mycol,
t1.mdate,
t2.name,
t2.idate
from t1
join t2
on t1.myid=t2.myid
order by t1.mdate desc, t2.idate desc) subq1)
But it "distincts" the wrong row(because a younger mdate is available):
select * from t3 where mycol =1059
myid | myyear | mycol | mdate | name | idate |
---|---|---|---|---|---|
11833 | 2022 | 1059 | 2022-11-03 22:02:00 | Name1684 | 2023-01-10 15:52:55 |
here also as fiddle:
https://dbfiddle.uk/eS5FoBeq
Best
2
Answers
or rewrite your query as:
if you
distinct on (x,y)
then you order by should be order byx,y,z
x, y is the columns that you want to get the unique row.In a group set (x,y), there are many rows, but you only want one, then you need order by z to get the only one row in a group set (x,y) in a deterministic way, otherwise, it will get a random row in a group set(x,y).
In general I try to avoid using distinct.
You can use row number to identify the number of elements with the same "myyear" and "mycol" and order them by newest date and then select the first value (rn =1).