skip to Main Content

I have a database with the following structure

table_a
| id       | username |
| -------- | -------- |
| 1        | 'a.a'    |
| 2        | 'a.a'    |

table_b

| id       | table_a_id |
| -------- | ---------- |
| 1        | 1          |
| 2        | 3          |

table_c

| id       | table_a_id |
| -------- | ---------- |
| 1        | 2          |
| 2        | 3          |

I want to automattically find the dependences of table_a in the database and change the ids of the second user to the one of the first user so that the end result would look like this:

table_a
| id       | username |
| -------- | -------- |
| 1        | 'a.a'    |
| 2        | 'a.a'    |

table_b

| id       | table_a_id |
| -------- | ---------- |
| 1        | 1          |
| 2        | 3          |

table_c

| id       | table_a_id |
| -------- | ---------- |
| 1        | 1          |
| 2        | 3          |

I can do it by listing the known tables, but I haven’t been able to automatically find all the dependencies in the database.

2

Answers


  1. Something like this:

    update table_b
    set table_b.table_a_id = min_a.id
    from table_a current
    join table_a min_a
    on current.username = min_a.username
    left join table_a earlier
    on min_a.username = earlier.username and earlier.id < min_a.id
    where earlier.id is null and table_b.table_a_id = current.id and current.id > min_a.id
    

    Explanation:

    • we use an update-join
    • we update table_b
    • from table_a current that is the current match of table_b in table_a
    • join with table_a min_a which represents the record in table_a having the minimum id with the same username
    • left joined with table_a earlier to guarantee that min_a is indeed the minimum match
    • so earlier.id is to be null
    • and we avoid doing an update if the current match is already the minimum
    Login or Signup to reply.
  2. The PostgreSQL system catalogs can be used to identify referential dependencies and to generate code to delete redundant values from the parent table after updating the references in the child tables.

    The following script creates and populates tables to demonstrate addressing redundant references:

    CREATE TABLE table_a (
      id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
      username TEXT
    );
    
    CREATE TABLE table_b (
      id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
      table_a_id INTEGER REFERENCES table_a(id)
    );
    
    CREATE TABLE table_c (
      id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
      table_a_id INTEGER REFERENCES table_a(id)
    );
    
    INSERT INTO table_a(username)
    VALUES ('a.a'), ('a.a'), ('a.b');
    
    INSERT INTO table_b(table_a_id)
    VALUES (1), (3);
    
    INSERT INTO table_c(table_a_id)
    VALUES (2), (3);
    

    The resulting table contents are:

    Table A:

    id username
    1 a.a
    2 a.a
    3 a.b

    Table B:

    id table_a_id
    1 1
    2 3

    Table C:

    id table_a_id
    1 2
    2 3

    The following DO block uses the system catalogs to generate a statement for each parent table. The generated statement employs chained CTEs to identify redundant parent rows, update the child table references to the redundant rows, and then delete the redundant rows. (This code addresses cases where each foreign key references a single column and none of the child tables has a unique index that will be violated when references are updated.)

    DO LANGUAGE plpgsql
    $BLOCK$
    DECLARE
      rec record;
    BEGIN
      FOR rec IN
        WITH
          cte AS (
            SELECT
              c.confrelid AS parent_table_oid,
              c.confkey[1] AS parent_column_num,
              -- generate table expressions to update the dependent child tables
              FORMAT(
                'u%1$s AS (UPDATE %2$I.%3$I n SET %4$I = r.kept_id FROM r WHERE n.%4$I = ANY(r.redundant_ids))',
                ROW_NUMBER() OVER (PARTITION BY c.confrelid ORDER BY child_tables.oid),
                child_ns.nspname,
                child_tables.relname,
                child_columns.attname
              ) AS update_cte
            FROM
              pg_constraint c
              JOIN pg_class child_tables ON child_tables.oid = c.conrelid
              JOIN pg_namespace child_ns ON child_ns.oid = child_tables.relnamespace
              JOIN pg_attribute child_columns ON child_columns.attrelid = c.conrelid
              AND child_columns.attnum = c.conkey[1]
            WHERE
              c.confrelid = 'table_a'::regclass
          )
        SELECT
          -- generate a table expression to identify redundant rows in the parent table
          FORMAT(
            'WITH r AS (SELECT MIN(%1$I) AS kept_id, (ARRAY_AGG(%1$I ORDER BY %1$I))[2:] AS redundant_ids FROM %2$I.%3$I GROUP BY username HAVING COUNT(*) > 1),',
            parent_columns.attname,
            parent_ns.nspname,
            parent_tables.relname
          ) ||
          -- concatenate associated child update table expressions
          STRING_AGG(cte.update_cte, ',') ||
          -- append the delete statement to remove redundant rows from the parent table
          FORMAT(
            'DELETE FROM %1$I.%2$I USING r WHERE %2$I.%3$I = ANY(r.redundant_ids)',
            parent_ns.nspname,
            parent_tables.relname,
            parent_columns.attname
          ) AS statement
        FROM
          cte
          JOIN pg_class parent_tables ON parent_tables.oid = cte.parent_table_oid
          JOIN pg_attribute parent_columns ON parent_columns.attrelid = cte.parent_table_oid
          AND parent_columns.attnum = cte.parent_column_num
          JOIN pg_namespace parent_ns ON parent_ns.oid = parent_tables.relnamespace
        GROUP BY
          parent_ns.nspname,
          parent_tables.relname,
          parent_columns.attname LOOP
      EXECUTE rec.statement;
      END loop;
    END;
    $BLOCK$;
    

    Below is a formatted example of the generated SQL:

    WITH
      r AS (
        SELECT MIN(id) AS kept_id, (ARRAY_AGG(id ORDER BY id))[2:] AS redundant_ids
          FROM public.table_a
          GROUP BY username
          HAVING COUNT(*) > 1
      ),
      u1 AS (
        UPDATE public.table_b n
          SET table_a_id = r.kept_id
          FROM r
          WHERE n.table_a_id = ANY (r.redundant_ids)
      ),
      u2 AS (
        UPDATE public.table_c n
          SET table_a_id = r.kept_id
          FROM r
          WHERE n.table_a_id = ANY (r.redundant_ids)
      )
    DELETE FROM public.table_a
      USING r
      WHERE table_a.id = ANY (r.redundant_ids);
    

    After running the script, the demonstration tables have the following content:

    Table A:

    id username
    1 a.a
    3 a.b

    Table B:

    id table_a_id
    1 1
    2 3

    Table C:

    id table_a_id
    1 1
    2 3
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search