How to update multiple times the same rows in one update (parallel arrays) from a sub select - Postgresql

patricey0
April 27, 2023
269 views
0 votes
2 Answers

Here is the structure of the 2 tables:

This is the "main" table :

CREATE TEMP TABLE final_table (
    idx INTEGER,
    pids INTEGER[],
    stats INTEGER[]
)
INSERT INTO final_table
VALUES
    (2733111, '{43255890, 8653548}'::int[], '{4, 5}'::int[]),   
    (2733112, '{54387564}'::int[]         , '{6}'::int[]   ),   
    (2733113, '{}'::int[]                 , '{}'::int[]    )

This is the data from temp table :

CREATE TEMP TABLE aggreg (
    idx INTEGER,
    pid INTEGER,
    count INTEGER
)
VALUES
    (2733111, 21854997,  2),    
    (2733111, 21854923, 10),    
    (2733112, 12345689,  3),    
    (2733113, 98765348, 11),    
    (2733111, 43255890,  4),    
    (2733112, 54387564,  6);

The goal is to UPDATE the first table (final_table) with the data from the temporary table.

The difficulty lies in the fact that, in a classic update, the same line will be called several times and therefore will only have the last result in update. It is thus necessary to pass by a sub SELECT in the FROM of the UPDATE and it is there that I am stuck.

I can’t seem to find the right syntax in the SELECT sub to do this.

Additionally, in the real table some rows may already contain data in the pids and stats columns. The purpose of this UPDATE is to add only the pid and their corresponding count.

The last query I tested (and didn’t work) is the following:

UPDATE final_table AS ft
    SET
        stats = ag2.stats || (ag2.stats - ft.stats),
        pids = ag2.pids || (ag2.pids - ft.pids)
FROM (
    SELECT
        ag.idx,
        array_agg(ag.pid) AS pids,
        array_agg(ag."count") AS stats
    FROM
            aggreg AS ag
    INNER JOIN
            final_table AS ft
    ON
            ag.idx = ft.idx
    GROUP BY
            ag.idx
) AS ag2
WHERE
    ag2.idx = ft.idx

Also I use the intarray extension to do the array - array operation.

Based on the example tables, I expect the following result:

idx	pids	stats
2733111	{43255890, 8653548, 21854997, 21854923}	{4, 5, 2, 10}
2733112	{54387564, 12345689}	{6, 3}
2733113	{98765348}	{11}

Thanks in advance for your help!

Answers

- lemon
- April 27, 2023 at 3:35 pm
- 0 votes
0
Instead of using a JOIN operation between the two tables and using it inside the UPDATE statement, you can craft your arrays from the "aggreg" table and use it within the FROM ... WHERE ... clauses of the UPDATE statement as follows.

In order to deal with existing elements inside the arrays, you should unnest your previous data first, merge it with your aggreg data, and update your "final_table" accordingly.
```
WITH full_data AS (
    SELECT * FROM aggreg
  
    UNION
  
    SELECT ft.idx,
           t.pids, 
           t.stats
    FROM final_table ft
    CROSS JOIN UNNEST(ft.pids, ft.stats) AS t(pids, stats)
), aggreg_data AS (
    SELECT idx, 
           ARRAY_AGG(pid ORDER BY pid)   AS pids, 
           ARRAY_AGG(count ORDER BY pid) AS counts
    FROM full_data
    GROUP BY idx
)
UPDATE final_table
SET pids = cte.pids,
    stats = cte.counts
FROM aggreg_data cte
WHERE final_table.idx = cte.idx;
```
Output:

idx pids stats

2733111 [21854997,21854923,8653548,43255890] [2,10,5,4]

2733112 [12345689,54387564] [3,6]

2733113 [98765348] [11]

Check the demo here.

Note: The ORDER BY clause inside the ARRAY_AGG function will ensure that correspondence between arrays is kept. If you need to have the exact original order, the query may become a little more complex and heavy, as that would require to involve a rowid (to be computed with ROW_NUMBER) and a further subquery.
Login or Signup to reply.

idx	pids	stats
2733111	[21854997,21854923,8653548,43255890]	[2,10,5,4]
2733112	[12345689,54387564]	[3,6]
2733113	[98765348]	[11]

The following query (addressing the original question’s requirements) adds only new pids and their associated stats from aggreg for idx values that are already in final_table:

UPDATE final_table t
  SET
    pids = t.pids || s.pids,
    stats = t.stats || s.stats
  FROM (SELECT a.idx,
               array_agg(a.pid) pids,
               array_agg(a.count) stats
          FROM aggreg a
          JOIN final_table f
            ON (a.idx = f.idx AND NOT ARRAY[a.pid] <@ f.pids)
         GROUP BY a.idx) s
  WHERE t.idx = s.idx;

The original question and subsequent comments don’t mention what should be done with idx values in aggreg that don’t exist in final_table. The following query assumes that new rows should be inserted for this case. The query depends on final_table having a unique index on idx to identify conflicts.

WITH pid_counts AS (
  SELECT f.idx, t.pid, t.count
    FROM (SELECT DISTINCT idx
            FROM aggreg) i
    JOIN final_table f ON i.idx = f.idx
    CROSS JOIN UNNEST(f.pids, f.stats) t(pid, count)
  UNION ALL
  SELECT idx, pid, count
    FROM aggreg
),
summed_counts AS (
  SELECT idx, pid, SUM(count) AS count
    FROM pid_counts
   GROUP BY idx, pid
),
stats AS (
  SELECT idx,
         ARRAY_AGG(pid ORDER BY pid) AS pids,
         ARRAY_AGG(count ORDER BY pid) AS stats
    FROM summed_counts
   GROUP BY idx
)
  INSERT INTO final_table(idx, pids, stats)
    SELECT idx, pids, stats
      FROM stats
  ON CONFLICT (idx) DO UPDATE
    SET pids = excluded.pids,
        stats = excluded.stats;

This query is fundamentally the same as lemon’s except for the added ability to insert new rows and a join in pid_counts that restricts the rows selected from final_table to only those with matching idx values in aggreg.

Please signup or login to give your own answer.

Click here to cancel reply.

How to update multiple times the same rows in one update (parallel arrays) from a sub select – Postgresql

Answers