Here is the structure of the 2 tables:
This is the "main" table :
CREATE TEMP TABLE final_table (
idx INTEGER,
pids INTEGER[],
stats INTEGER[]
)
INSERT INTO final_table
VALUES
(2733111, '{43255890, 8653548}'::int[], '{4, 5}'::int[]),
(2733112, '{54387564}'::int[] , '{6}'::int[] ),
(2733113, '{}'::int[] , '{}'::int[] )
This is the data from temp table :
CREATE TEMP TABLE aggreg (
idx INTEGER,
pid INTEGER,
count INTEGER
)
VALUES
(2733111, 21854997, 2),
(2733111, 21854923, 10),
(2733112, 12345689, 3),
(2733113, 98765348, 11),
(2733111, 43255890, 4),
(2733112, 54387564, 6);
The goal is to UPDATE the first table (final_table) with the data from the temporary table.
The difficulty lies in the fact that, in a classic update, the same line will be called several times and therefore will only have the last result in update. It is thus necessary to pass by a sub SELECT
in the FROM
of the UPDATE
and it is there that I am stuck.
I can’t seem to find the right syntax in the SELECT sub to do this.
Additionally, in the real table some rows may already contain data in the pids
and stats
columns. The purpose of this UPDATE
is to add only the pid
and their corresponding count
.
The last query I tested (and didn’t work) is the following:
UPDATE final_table AS ft
SET
stats = ag2.stats || (ag2.stats - ft.stats),
pids = ag2.pids || (ag2.pids - ft.pids)
FROM (
SELECT
ag.idx,
array_agg(ag.pid) AS pids,
array_agg(ag."count") AS stats
FROM
aggreg AS ag
INNER JOIN
final_table AS ft
ON
ag.idx = ft.idx
GROUP BY
ag.idx
) AS ag2
WHERE
ag2.idx = ft.idx
Also I use the intarray
extension to do the array - array
operation.
Based on the example tables, I expect the following result:
idx | pids | stats |
---|---|---|
2733111 | {43255890, 8653548, 21854997, 21854923} | {4, 5, 2, 10} |
2733112 | {54387564, 12345689} | {6, 3} |
2733113 | {98765348} | {11} |
Thanks in advance for your help!
2
Answers
Instead of using a
JOIN
operation between the two tables and using it inside the UPDATE statement, you can craft your arrays from the "aggreg" table and use it within theFROM ... WHERE ...
clauses of the UPDATE statement as follows.In order to deal with existing elements inside the arrays, you should unnest your previous data first, merge it with your aggreg data, and update your "final_table" accordingly.
Output:
Check the demo here.
Note: The
ORDER BY
clause inside theARRAY_AGG
function will ensure that correspondence between arrays is kept. If you need to have the exact original order, the query may become a little more complex and heavy, as that would require to involve a rowid (to be computed withROW_NUMBER
) and a further subquery.The following query (addressing the original question’s requirements) adds only new
pids
and their associatedstats
fromaggreg
foridx
values that are already infinal_table
:The original question and subsequent comments don’t mention what should be done with
idx
values inaggreg
that don’t exist infinal_table
. The following query assumes that new rows should be inserted for this case. The query depends onfinal_table
having a unique index onidx
to identify conflicts.This query is fundamentally the same as lemon’s except for the added ability to insert new rows and a join in
pid_counts
that restricts the rows selected fromfinal_table
to only those with matchingidx
values inaggreg
.