In PostgreSQL, I want to run two tests:
- if there is a table in a specific schema which is not in the
required_tables
table - if there is a table in the
required_tables
table which is not in the specific schema
FIRST QUERY
For the first query I do as follows:
select t1.table_name as t1_table_name, t2.table_name as t2_extra_tables_in_schema
from required_tables t1
right join information_schema.tables t2
on t1.table_name = t2.table_name
where t2.table_schema='Schema_X'
and t1.table_name IS NULL
This gives me the required result – i.e. a table in the schema which is not in the required_tables table.
SECOND_QUERY
However, for the second query, when I try the following (the equivalent to the first query but with a left join this time):
select t1.table_name as t1_tables_missing_from_schema, t2.table_name
from required_tables t1
left join information_schema.tables t2
on t1.table_name = t2.table_name
where t2.table_schema='Schema_X'
and t2.table_name IS NULL
I always get zero results, even though I know that there is a table in required_tables which is not in the schema.
So, having read that "where" clauses can mess up joins (though I don’t understand why), I tried the following:
select t1.table_name as t1_tables_missing_from_schema, t2.table_name
from required_tables t1
left join information_schema.tables t2
on t1.table_name = t2.table_name
where t2.table_name IS NULL
This indeed gives me the name of the table which is in required_tables, but not in the schema.
However, lets say the table in question was in a different schema – in that case, I would not get any result from my previous query as the table would be found in that other schema, which I do not want to check.
How do I get around this issue and use the where t2.table_schema=’Schema_X’ in my second query ?
Additionally, if there was a way to get both missing results in a single query (maybe somehow with a full outer join is my guess), that would interest me, too.
2
Answers
You can just more-or-less write it out as you would describe the problem
Setup:
The query:
The results:
Like you already guessed, a
FULL [OUTER] JOIN
is the fastest way:fiddle
Notably, put the
WHERE
condition in the subquery.Related:
(For the
FULL JOIN
we need to filter before the join, so we cannot put that into join condition, either.)I chose the system view
pg_catalog.pg_tables
overinformation_schema.tables
for two reasons:Explained in detail here:
Assuming this table definition for convenience:
tablename
conveniently matches the column name ofpg_tables.tablename
.I added alternative queries to the fiddle for when the column name does not match.
PK rules out null and duplicates, so we don’t need to care of these corner cases in the query.
And I chose a legal, lower-case, unquoted name for ‘schema_x’. Mixed-case Postgres identifiers make me uncomfortable.