Column A | Column B | Column C | Column D |
---|---|---|---|
A | B | C | A |
D | A | A | B |
C | A | D | A |
This is the table for instance. We are supposed to find the number of occurrences of A in the entire table(from all columns). How do we solve such a question without using manual case statements?
- There a way to solve this without explicitly mentioning the names of all the columns?
- Is there a scalable self contained solution where this can be scaled to let’s say X number of columns later on?
I’ve sovled it with a brute force solution checking for "A" in all the columns one by one. It does solve the problem however it isn’t really efficient and scalable.
The expected output is just the number of occurences of "A", in this case the ans would be 5
3
Answers
The basic
SELECT
syntax doesn’t have any way of handling dynamic sets of columns. The simplest workaround is usually to convert the record to a JSON value; from there you can just treat the whole thing as a piece of data, and use Postgres’ suite of JSON functions to pull it apart.This seems to do what you want:
If the cost of converting to JSON becomes an issue, it might be more efficient to query
information_schema
orpg_catalog
to get the table’s column names, and dynamically build yourCASE
statement on the client side.If you really don’t care which column it is, and want to find how many times does particular word appear in the whole table, then you can use regular expressions in PostgreSQL to calculate it.
Please note this is slow for big tables.
Your question is ambiguous: should the result be the number of columns with the specified value, or the number of times the string occurs within all columns of the table? The following query handles the first case (total number of columns in the table with the given value):
The query uses
information_schema.columns
to determine all of the columns with character data and only searches those columns. The generated query performs a single scan of the table.This query can easily be modified to address other searches; e.g., changing
=
to~
in theFILTER
clause will perform a regular expression search instead of string equality.To get the total number of times that a regular expression occurs within the columns, use
REGEXP_COUNT
instead ofCOUNT
, as follows:Escape any characters in the search_string that have special meaning as part of a regular expression and use appropriate flags in
REGEXP_COUNT
to change search behavior; e.g., ‘i’ to perform case insensitive matching.