skip to Main Content
Column A Column B Column C Column D
A B C A
D A A B
C A D A

This is the table for instance. We are supposed to find the number of occurrences of A in the entire table(from all columns). How do we solve such a question without using manual case statements?

  1. There a way to solve this without explicitly mentioning the names of all the columns?
  2. Is there a scalable self contained solution where this can be scaled to let’s say X number of columns later on?

I’ve sovled it with a brute force solution checking for "A" in all the columns one by one. It does solve the problem however it isn’t really efficient and scalable.

The expected output is just the number of occurences of "A", in this case the ans would be 5

3

Answers


  1. The basic SELECT syntax doesn’t have any way of handling dynamic sets of columns. The simplest workaround is usually to convert the record to a JSON value; from there you can just treat the whole thing as a piece of data, and use Postgres’ suite of JSON functions to pull it apart.

    This seems to do what you want:

    select count(*)
    from t
    cross join jsonb_each(to_jsonb(t.*)) row(col,val)
    where val = to_jsonb('A'::text);
    

    If the cost of converting to JSON becomes an issue, it might be more efficient to query information_schema or pg_catalog to get the table’s column names, and dynamically build your CASE statement on the client side.

    Login or Signup to reply.
  2. If you really don’t care which column it is, and want to find how many times does particular word appear in the whole table, then you can use regular expressions in PostgreSQL to calculate it.

    SELECT sum(
      (SELECT count(*) FROM regexp_matches(t::text, '[[:<:]]A[[:>:]]', 'g'))
    ) FROM t;
    

    Please note this is slow for big tables.

    Login or Signup to reply.
  3. Your question is ambiguous: should the result be the number of columns with the specified value, or the number of times the string occurs within all columns of the table? The following query handles the first case (total number of columns in the table with the given value):

    WITH parameters(search_string, table_schema, table_name) AS (
      VALUES ('A', 'public', 'some_table'))
    SELECT (XPATH('/table/row/num_occurrences/text()',
                  QUERY_TO_XML(FORMAT($query$SELECT %s AS num_occurrences FROM (VALUES (%L)) p(search_string) CROSS JOIN %I.%I$query$,
                                      STRING_AGG(FORMAT($$COUNT(*) FILTER (WHERE %I = p.search_string)$$, c.column_name),
                                                 ' + ' ORDER BY c.column_name),
                                      p.search_string,
                                      p.table_schema,
                                      p.table_name),
                               false,
                               false,
                               '')))[1]::text::integer
      FROM PARAMETERS p
      JOIN information_schema.columns c
        ON     p.table_schema = c.table_schema
           AND p.table_name = c.table_name
      WHERE c.data_type IN ('character', 'character varying', 'text')
      GROUP BY p.search_string, p.table_schema, p.table_name;
    

    The query uses information_schema.columns to determine all of the columns with character data and only searches those columns. The generated query performs a single scan of the table.

    This query can easily be modified to address other searches; e.g., changing = to ~ in the FILTER clause will perform a regular expression search instead of string equality.

    To get the total number of times that a regular expression occurs within the columns, use REGEXP_COUNT instead of COUNT, as follows:

    WITH parameters(search_string, table_schema, TABLE_NAME) AS (
      VALUES ('A', 'public', 'some_table'))
    SELECT (XPATH('/table/row/num_occurrences/text()',
                  QUERY_TO_XML(FORMAT($query$SELECT SUM(num_occurrences) AS num_occurrences FROM (SELECT %s AS num_occurrences FROM (VALUES (%L)) p(search_string) CROSS JOIN %I.%I) t$query$,
                                      STRING_AGG(FORMAT($$COALESCE(REGEXP_COUNT(%I, p.search_string), 0)$$, c.column_name),
                                                 ' + ' ORDER BY c.column_name),
                                      p.search_string,
                                      p.table_schema,
                                      p.table_name),
                               false,
                               false,
                               '')))[1]::text::integer
      FROM PARAMETERS p
      JOIN information_schema.columns c
        ON     p.table_schema = c.table_schema
           AND p.table_name = c.table_name
      WHERE c.data_type IN ('character', 'character varying', 'text')
      GROUP BY p.search_string, p.table_schema, p.table_name;
    

    Escape any characters in the search_string that have special meaning as part of a regular expression and use appropriate flags in REGEXP_COUNT to change search behavior; e.g., ‘i’ to perform case insensitive matching.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search