skip to Main Content

I am very much new to SQL and have been given an exercise for homework that sounds when translated like this:
"For each column type find the number of tables that do not contain this type of column".
I know how to find count of tables that contain at least one column type:

SELECT Data_Type, COUNT(Table_schema || '.' || table_name)
FROM Information_schema.Columns
GROUP BY Data_Type;

But trying to figure out how to do the opposite of this has left me scratching my head for hours. I have tried to use WHERE:

WHERE Data_Type IS NULL

But it gives same results as the query without this line. I have seen solutions when counting tables with specifically named columns (using ‘join’ and ‘not in’) but I don’t think this would work for this task. I think NOT EXISTS would probably work but not sure how to implement it. Thanks in advance!

3

Answers


  1. You can do something like this:

    with dtypes as (
        select distinct Data_Type
        from Information_schema.Columns
      )
    select
      data_type,
      (
        select count(*)
        from Information_schema.tables t
        where not exists (
            select 1
            FROM Information_schema.Columns c2
            WHERE
              c2.Table_schema = t.Table_schema
              and c2.Table_name = t.Table_name
              and c2.Data_Type = d.Data_Type
          )
        AND t.table_type = 'BASE TABLE'
        -- AND t.Table_schema = 'MYSCHEMA' -- in case you want to count tables only from single schema.
      )
    from dtypes d
    

    Here CTE dtypes contains all types present in database as column types.

    Then for every type we find number of tables, that do not have at least one column of this type.

    Demo here

    Login or Signup to reply.
  2. Here is another approach to solving the problem:

    WITH tc AS (
      SELECT
        t.table_schema,
        t.table_name,
        COUNT(*) OVER (PARTITION BY NULL) AS total_table_cnt
      FROM
        information_schema.tables t
      WHERE
        t.table_type = 'BASE TABLE'
    )
    SELECT
      c.data_type,
      MAX(tc.total_table_cnt) - COUNT(DISTINCT (c.table_schema, c.table_name)) AS table_cnt
    FROM
      tc
      JOIN information_schema.columns c
        ON (tc.table_schema, tc.table_name) = (c.table_schema, c.table_name)
    GROUP BY
      c.data_type
    ORDER BY
      c.data_type;
    

    This isn’t presented with any claims of superiority over other approaches, but just as a demonstration of additional techniques.

    The query uses a common table expression (CTE) to determine the set of tables. The expression COUNT(*) OVER (PARTITION BY NULL) gets a count of the members at the same time, thus avoiding an additional subquery.

    In the main query, MAX(tc.total_table_cnt) wraps the total table count in an aggregate function so that it’s value can be used in a grouped query. MIN could also have been used since the value is the same for all rows in tc.

    COUNT(DISTINCT (c.table_schema, c.table_name)) gets the count of tables having at least one column of the group’s data type.

    Row constructors, (tc.table_schema, tc.table_name) = (c.table_schema, c.table_name), are used for the join criteria. There’s no significant advantage in this case over using separate equality comparisons for each column, but row constructors can be very useful when multiple columns are needed with the IN operator or subqueries.

    Login or Signup to reply.
  3. The question really needs to be defined more closely.

    • Which tables and which types to include?
    • Include internal data types and system tables?
    • Is varchar(25) distinct from varchar?
    • What about partitioned tables?
    • Materialized views?
    • Generated columns?
    • User types?

    In my hands, this query is around 50x faster than proposed queries based on the information schema:

    SELECT col.atttypid::regtype AS data_type
         , tbl.tbls_total - col.tbls_with_type AS tbls_without_that_type
    FROM  (
       SELECT a.atttypid::regtype, count(DISTINCT a.attrelid) AS tbls_with_type
       FROM   pg_attribute a
       JOIN   pg_class c ON c.oid = a.attrelid
       WHERE  c.relnamespace = 'public'::regnamespace  -- only user schema(s)
       AND    c.relkind = 'r'
       AND    a.attnum > 0
       AND    NOT a.attisdropped
       GROUP  BY 1
       ) col
    CROSS JOIN (
       SELECT  count(*) AS tbls_total
       FROM    pg_class c
       WHERE   c.relnamespace = 'public'::regnamespace  -- only user schema(s)
       AND     relkind = 'r'
       ) tbl
    ORDER  BY 2 DESC, 1
    

    The basic approach is to count existing tables for each data type, and then subtract that from the total count of tables. Simpler.

    I excluded system columns and limited the query to the public schema. Adapt to your needs.

    There are various differences!

    Notably, the query on the information_schema lumps all "ARRAY" types together, while my query treats text[], int[], etc. as distinct types. (Much more sensible IMO.)

    Also, the information schema does not include partitioned tables (properly). You may or may not want that.

    About information schema vs. system catalogs:

    Related:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search