Postgresql - How to find missing rows on two sides of a join?

didjek
July 4, 2024
233 views
0 votes
2 Answers

In PostgreSQL, I want to run two tests:

if there is a table in a specific schema which is not in the required_tables table
if there is a table in the required_tables table which is not in the specific schema

FIRST QUERY

For the first query I do as follows:

select t1.table_name as t1_table_name, t2.table_name as t2_extra_tables_in_schema 
from required_tables t1 
right join information_schema.tables t2 
on t1.table_name = t2.table_name 
where t2.table_schema='Schema_X' 
and t1.table_name IS NULL

This gives me the required result – i.e. a table in the schema which is not in the required_tables table.

SECOND_QUERY

However, for the second query, when I try the following (the equivalent to the first query but with a left join this time):

select t1.table_name as t1_tables_missing_from_schema, t2.table_name
from required_tables t1 
left join information_schema.tables t2 
on t1.table_name = t2.table_name 
where t2.table_schema='Schema_X' 
and t2.table_name IS NULL

I always get zero results, even though I know that there is a table in required_tables which is not in the schema.

So, having read that "where" clauses can mess up joins (though I don’t understand why), I tried the following:

select t1.table_name as t1_tables_missing_from_schema, t2.table_name 
from required_tables t1 
left join information_schema.tables t2 
on t1.table_name = t2.table_name 
where t2.table_name IS NULL

This indeed gives me the name of the table which is in required_tables, but not in the schema.

However, lets say the table in question was in a different schema – in that case, I would not get any result from my previous query as the table would be found in that other schema, which I do not want to check.

How do I get around this issue and use the where t2.table_schema=’Schema_X’ in my second query ?

Additionally, if there was a way to get both missing results in a single query (maybe somehow with a full outer join is my guess), that would interest me, too.

Tags: postgresql sql

Answers

You can just more-or-less write it out as you would describe the problem

Setup:

CREATE TEMP TABLE required_tables (table_name name);
INSERT INTO required_tables VALUES ('aaa'), ('bbb');
CREATE SCHEMA q;
CREATE TABLE q.aaa (i int);
CREATE TABLE q.ccc (i int);

The query:

WITH extra_tables AS (
  SELECT t.table_name
  FROM information_schema.tables t 
  WHERE table_schema='q'
  EXCEPT
  SELECT table_name
  FROM required_tables
)
, missing_tables AS (
  SELECT rt.table_name 
  FROM required_tables rt
  EXCEPT
  SELECT table_name fROM information_schema.tables WHERE table_schema='q'
)
SELECT 'extra' AS reason, et.table_name
FROM extra_tables et
UNION ALL
SELECT 'missing', mt.table_name
FROM missing_tables mt;

The results:

┌─────────┬────────────┐
│ reason  │ table_name │
├─────────┼────────────┤
│ extra   │ ccc        │
│ missing │ bbb        │
└─────────┴────────────┘
(2 rows)

- ErwinBrandstetter
- July 4, 2024 at 8:28 am
- 0 votes
0
Like you already guessed, a FULL [OUTER] JOIN is the fastest way:
```
SELECT CASE WHEN r.tablename IS NULL THEN 'added' ELSE 'missing' END AS kind
     , tablename
FROM   required_tables r
FULL   JOIN (               -- !
   SELECT tablename
   FROM   pg_tables
   WHERE  schemaname = 'schema_x'
   ) t USING (tablename)    -- for convenience
WHERE (r.tablename IS NULL OR t.tablename IS NULL)
ORDER  BY kind, tablename;  -- optional
```
fiddle

Notably, put the WHERE condition in the subquery.
Related:
- Postgres LEFT JOIN with WHERE condition
(For the FULL JOIN we need to filter before the join, so we cannot put that into join condition, either.)

I chose the system view pg_catalog.pg_tables over information_schema.tables for two reasons:
- It is substantially smaller and faster.
- It actually lists all tables. (!)
Explained in detail here:
- How to check if a table exists in a given schema
Assuming this table definition for convenience:
```
CREATE TABLE required_tables (tablename name PRIMARY KEY);  -- !
```
- tablename conveniently matches the column name of pg_tables.tablename.
  I added alternative queries to the fiddle for when the column name does not match.
- PK rules out null and duplicates, so we don’t need to care of these corner cases in the query.
And I chose a legal, lower-case, unquoted name for ‘schema_x’. Mixed-case Postgres identifiers make me uncomfortable.
- Are PostgreSQL column names case-sensitive?
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to find missing rows on two sides of a join?

FIRST QUERY

SECOND_QUERY

Answers