skip to Main Content

In PostgreSQL, I want to run two tests:

  • if there is a table in a specific schema which is not in the required_tables table
  • if there is a table in the required_tables table which is not in the specific schema
FIRST QUERY

For the first query I do as follows:

select t1.table_name as t1_table_name, t2.table_name as t2_extra_tables_in_schema 
from required_tables t1 
right join information_schema.tables t2 
on t1.table_name = t2.table_name 
where t2.table_schema='Schema_X' 
and t1.table_name IS NULL 

This gives me the required result – i.e. a table in the schema which is not in the required_tables table.

SECOND_QUERY

However, for the second query, when I try the following (the equivalent to the first query but with a left join this time):

select t1.table_name as t1_tables_missing_from_schema, t2.table_name
from required_tables t1 
left join information_schema.tables t2 
on t1.table_name = t2.table_name 
where t2.table_schema='Schema_X' 
and t2.table_name IS NULL

I always get zero results, even though I know that there is a table in required_tables which is not in the schema.

So, having read that "where" clauses can mess up joins (though I don’t understand why), I tried the following:

select t1.table_name as t1_tables_missing_from_schema, t2.table_name 
from required_tables t1 
left join information_schema.tables t2 
on t1.table_name = t2.table_name 
where t2.table_name IS NULL

This indeed gives me the name of the table which is in required_tables, but not in the schema.

However, lets say the table in question was in a different schema – in that case, I would not get any result from my previous query as the table would be found in that other schema, which I do not want to check.

How do I get around this issue and use the where t2.table_schema=’Schema_X’ in my second query ?

Additionally, if there was a way to get both missing results in a single query (maybe somehow with a full outer join is my guess), that would interest me, too.

2

Answers


  1. You can just more-or-less write it out as you would describe the problem

    Setup:

    CREATE TEMP TABLE required_tables (table_name name);
    INSERT INTO required_tables VALUES ('aaa'), ('bbb');
    CREATE SCHEMA q;
    CREATE TABLE q.aaa (i int);
    CREATE TABLE q.ccc (i int);
    

    The query:

    WITH extra_tables AS (
      SELECT t.table_name
      FROM information_schema.tables t 
      WHERE table_schema='q'
      EXCEPT
      SELECT table_name
      FROM required_tables
    )
    , missing_tables AS (
      SELECT rt.table_name 
      FROM required_tables rt
      EXCEPT
      SELECT table_name fROM information_schema.tables WHERE table_schema='q'
    )
    SELECT 'extra' AS reason, et.table_name
    FROM extra_tables et
    UNION ALL
    SELECT 'missing', mt.table_name
    FROM missing_tables mt;
    

    The results:

    ┌─────────┬────────────┐
    │ reason  │ table_name │
    ├─────────┼────────────┤
    │ extra   │ ccc        │
    │ missing │ bbb        │
    └─────────┴────────────┘
    (2 rows)
    
    Login or Signup to reply.
  2. Like you already guessed, a FULL [OUTER] JOIN is the fastest way:

    SELECT CASE WHEN r.tablename IS NULL THEN 'added' ELSE 'missing' END AS kind
         , tablename
    FROM   required_tables r
    FULL   JOIN (               -- !
       SELECT tablename
       FROM   pg_tables
       WHERE  schemaname = 'schema_x'
       ) t USING (tablename)    -- for convenience
    WHERE (r.tablename IS NULL OR t.tablename IS NULL)
    ORDER  BY kind, tablename;  -- optional
    

    fiddle

    Notably, put the WHERE condition in the subquery.
    Related:

    (For the FULL JOIN we need to filter before the join, so we cannot put that into join condition, either.)

    I chose the system view pg_catalog.pg_tables over information_schema.tables for two reasons:

    • It is substantially smaller and faster.
    • It actually lists all tables. (!)

    Explained in detail here:

    Assuming this table definition for convenience:

    CREATE TABLE required_tables (tablename name PRIMARY KEY);  -- !
    
    • tablename conveniently matches the column name of pg_tables.tablename.
      I added alternative queries to the fiddle for when the column name does not match.

    • PK rules out null and duplicates, so we don’t need to care of these corner cases in the query.

    And I chose a legal, lower-case, unquoted name for ‘schema_x’. Mixed-case Postgres identifiers make me uncomfortable.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search