I have two tables (see below) in a postgreSQL database.
Table1
tracks all changes made to each ID (i.e. multiple rows per ID):
create table table1(id,record,name,data1,data2,data3)as values
(1,1,'Dave', 1.2,55,1)
,(1,2,'David', 1.2,55,1)
,(2,1,'Sam', .8,30,2)
,(3,1,'Jenn', .9,28,2)
,(4,1,'Arthur',1.1,77,1)
,(5,1,'Jim', .7,42,2)
,(5,2,'Jimmy', .7,42,2)
,(5,3,'James', .7,42,2)
,(6,1,'Sue', 1.3,32,2)
,(6,2,'Susan', 1.3,32,2);
Table2
contains only one row per ID:
create table table2(id,record,name,data1,data2,data3)as values
(1,1,'Dave', 1.2,55,1)
,(2,1,'Sam', .8,30,2)
,(3,1,'Jenn', .9,28,2)
,(4,1,'Arthur',1.1,77,1)
,(5,2,'Jimmy', .7,42,2)
,(6,1,'Sue', 1.3,32,2);
I need an SQL query to update Table2
based on the highest Record value for each ID in Table1
and also updating all other fields as well (data1
, data2
, data3
in this example).
These tables are just examples but my actual data has many columns, so I would like to use some sort of wild card to include all fields in the update.
Relatively new to SQL in postgreSQL but here is my attempt at getting the most recent record version returned, but not sure how to update Table2 from there. I have had a few attempts but nothing work so probably not worth posting that code.
select * from
Table1 t1
where
(ID,Record) in
(
select
ID,
MAX(Record)
from
Table1 t1
group by
ID
)
ExpectedResult:
id | record | name | data1 | data2 | data3 |
---|---|---|---|---|---|
1 | 2 | David | 1.2 | 55 | 1 |
2 | 1 | Sam | 0.8 | 30 | 2 |
3 | 1 | Jenn | 0.9 | 28 | 2 |
4 | 1 | Arthur | 1.1 | 77 | 1 |
5 | 3 | James | 0.7 | 42 | 2 |
6 | 2 | Susan | 1.3 | 32 | 2 |
2
Answers
We get the maximum record for each ID in the CTE (hgh) and use that in the associated UPDATE statement to update the respective columns:
The best way to achieve this is with a cte that makes use of
distinct on
. Something along the following lines:What
distinct on
gives you, is the ability to limit the resultset to just one line per column (or columns) that you are interested in. Theorder by
, which must start with the columns in thedistinct on
, determines from the remaining data, what row is returned. Ordering by record descending is therefore the same as getting that row which has theMAX(record)
for that id.The
update
simply joins the cte to the target table and updates the values accordingly.I am afraid that I know of no shortcut, which will mean that you do not need to specify each column!
For future reference, please supply sample data by way of insert statements. It saves us having to do the work to test.