Counting tables that don't contain a column type - Postgresql

ConfusedByRegex
April 18, 2023
264 views
2 votes
3 Answers

I am very much new to SQL and have been given an exercise for homework that sounds when translated like this:
"For each column type find the number of tables that do not contain this type of column".
I know how to find count of tables that contain at least one column type:

SELECT Data_Type, COUNT(Table_schema || '.' || table_name)
FROM Information_schema.Columns
GROUP BY Data_Type;

But trying to figure out how to do the opposite of this has left me scratching my head for hours. I have tried to use WHERE:

WHERE Data_Type IS NULL

But it gives same results as the query without this line. I have seen solutions when counting tables with specifically named columns (using ‘join’ and ‘not in’) but I don’t think this would work for this task. I think NOT EXISTS would probably work but not sure how to implement it. Thanks in advance!

Tags: postgresql sql

Answers

You can do something like this:

with dtypes as (
    select distinct Data_Type
    from Information_schema.Columns
  )
select
  data_type,
  (
    select count(*)
    from Information_schema.tables t
    where not exists (
        select 1
        FROM Information_schema.Columns c2
        WHERE
          c2.Table_schema = t.Table_schema
          and c2.Table_name = t.Table_name
          and c2.Data_Type = d.Data_Type
      )
    AND t.table_type = 'BASE TABLE'
    -- AND t.Table_schema = 'MYSCHEMA' -- in case you want to count tables only from single schema.
  )
from dtypes d

Here CTE dtypes contains all types present in database as column types.

Then for every type we find number of tables, that do not have at least one column of this type.

Demo here

- JohnH
- April 18, 2023 at 5:41 pm
- 0 votes
0
Here is another approach to solving the problem:
```
WITH tc AS (
  SELECT
    t.table_schema,
    t.table_name,
    COUNT(*) OVER (PARTITION BY NULL) AS total_table_cnt
  FROM
    information_schema.tables t
  WHERE
    t.table_type = 'BASE TABLE'
)
SELECT
  c.data_type,
  MAX(tc.total_table_cnt) - COUNT(DISTINCT (c.table_schema, c.table_name)) AS table_cnt
FROM
  tc
  JOIN information_schema.columns c
    ON (tc.table_schema, tc.table_name) = (c.table_schema, c.table_name)
GROUP BY
  c.data_type
ORDER BY
  c.data_type;
```
This isn’t presented with any claims of superiority over other approaches, but just as a demonstration of additional techniques.

The query uses a common table expression (CTE) to determine the set of tables. The expression COUNT(*) OVER (PARTITION BY NULL) gets a count of the members at the same time, thus avoiding an additional subquery.

In the main query, MAX(tc.total_table_cnt) wraps the total table count in an aggregate function so that it’s value can be used in a grouped query. MIN could also have been used since the value is the same for all rows in tc.

COUNT(DISTINCT (c.table_schema, c.table_name)) gets the count of tables having at least one column of the group’s data type.

Row constructors, (tc.table_schema, tc.table_name) = (c.table_schema, c.table_name), are used for the join criteria. There’s no significant advantage in this case over using separate equality comparisons for each column, but row constructors can be very useful when multiple columns are needed with the IN operator or subqueries.
Login or Signup to reply.

- ErwinBrandstetter
- April 18, 2023 at 6:54 pm
- 0 votes
0
The question really needs to be defined more closely.
- Which tables and which types to include?
- Include internal data types and system tables?
- Is varchar(25) distinct from varchar?
- What about partitioned tables?
- Materialized views?
- Generated columns?
- User types?
  …
In my hands, this query is around 50x faster than proposed queries based on the information schema:
```
SELECT col.atttypid::regtype AS data_type
     , tbl.tbls_total - col.tbls_with_type AS tbls_without_that_type
FROM  (
   SELECT a.atttypid::regtype, count(DISTINCT a.attrelid) AS tbls_with_type
   FROM   pg_attribute a
   JOIN   pg_class c ON c.oid = a.attrelid
   WHERE  c.relnamespace = 'public'::regnamespace  -- only user schema(s)
   AND    c.relkind = 'r'
   AND    a.attnum > 0
   AND    NOT a.attisdropped
   GROUP  BY 1
   ) col
CROSS JOIN (
   SELECT  count(*) AS tbls_total
   FROM    pg_class c
   WHERE   c.relnamespace = 'public'::regnamespace  -- only user schema(s)
   AND     relkind = 'r'
   ) tbl
ORDER  BY 2 DESC, 1
```
The basic approach is to count existing tables for each data type, and then subtract that from the total count of tables. Simpler.

I excluded system columns and limited the query to the public schema. Adapt to your needs.

There are various differences!

Notably, the query on the information_schema lumps all "ARRAY" types together, while my query treats text[], int[], etc. as distinct types. (Much more sensible IMO.)

Also, the information schema does not include partitioned tables (properly). You may or may not want that.

About information schema vs. system catalogs:
- Get column names and data types of a query, table or view
Related:
- How to check if a table exists in a given schema
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Counting tables that don't contain a column type – Postgresql

Answers