I am very much new to SQL and have been given an exercise for homework that sounds when translated like this:
"For each column type find the number of tables that do not contain this type of column".
I know how to find count of tables that contain at least one column type:
SELECT Data_Type, COUNT(Table_schema || '.' || table_name)
FROM Information_schema.Columns
GROUP BY Data_Type;
But trying to figure out how to do the opposite of this has left me scratching my head for hours. I have tried to use WHERE:
WHERE Data_Type IS NULL
But it gives same results as the query without this line. I have seen solutions when counting tables with specifically named columns (using ‘join’ and ‘not in’) but I don’t think this would work for this task. I think NOT EXISTS would probably work but not sure how to implement it. Thanks in advance!
3
Answers
You can do something like this:
Here CTE
dtypes
contains all types present in database as column types.Then for every type we find number of tables, that do not have at least one column of this type.
Demo here
Here is another approach to solving the problem:
This isn’t presented with any claims of superiority over other approaches, but just as a demonstration of additional techniques.
The query uses a common table expression (CTE) to determine the set of tables. The expression
COUNT(*) OVER (PARTITION BY NULL)
gets a count of the members at the same time, thus avoiding an additional subquery.In the main query,
MAX(tc.total_table_cnt)
wraps the total table count in an aggregate function so that it’s value can be used in a grouped query.MIN
could also have been used since the value is the same for all rows in tc.COUNT(DISTINCT (c.table_schema, c.table_name))
gets the count of tables having at least one column of the group’s data type.Row constructors,
(tc.table_schema, tc.table_name) = (c.table_schema, c.table_name)
, are used for the join criteria. There’s no significant advantage in this case over using separate equality comparisons for each column, but row constructors can be very useful when multiple columns are needed with theIN
operator or subqueries.The question really needs to be defined more closely.
varchar(25)
distinct fromvarchar
?…
In my hands, this query is around 50x faster than proposed queries based on the information schema:
The basic approach is to count existing tables for each data type, and then subtract that from the total count of tables. Simpler.
I excluded system columns and limited the query to the
public
schema. Adapt to your needs.There are various differences!
Notably, the query on the information_schema lumps all "ARRAY" types together, while my query treats
text[]
,int[]
, etc. as distinct types. (Much more sensible IMO.)Also, the information schema does not include partitioned tables (properly). You may or may not want that.
About information schema vs. system catalogs:
Related: