skip to Main Content

Im trying to write an sql statement which summarizes the count of rows by values in a specific column.

I have several tables, starting with s0_
Every table has a huge amount of rows with multiple, different values in column load_nr.

Example:
s0_table1:

load_nr column2 column3…
101 someData someData…
101 someData someData…
102 someData someData…
103 someData someData…
103 someData someData…
103 someData someData…

S0_table2:

load_nr column2 column3…
101 someData someData…
102 someData someData…
102 someData someData…
102 someData someData…
102 someData someData…
103 someData someData…

Desired result:

load_nr count s0_table1 count s0_table2
101 2 1
102 1 4
103 3 1

New data with up counting load_nr is added every day.
Best case is, that the table names are fetched dynamically (s0_*).

Could anyone give me a cloue how to build this statement? I’m a newbe and not so experienced with more complicated statements.

Best Regards

*edit: Hopefully the tables are displayed correctly xD

3

Answers


  1. For that you need a FULL OUTER JOIN in case the load_nr don’t match up

    SELECT COALESCE(s1.load_nr,s2.load_nr) as load_nr, Count_S1, Count_S2
    FROM
    (SELECT COUNT(*) Count_S1,load_nr FROM S0_table1 GROUP BY load_nr) s1 FULL OUTER JOIN 
    (SELECT COUNT(*) Count_S2,load_nr FROM S0_table2 GROUP BY load_nr) s2  ON s1.load_nr = s2.load_nr
    

    ORDER BY load_nr

    Login or Signup to reply.
  2. Dynamic table names are going to be a problem. You basically have no choice but to write a SQL statement dynamically. There’s a good chance that summing a UNION-ed subquery is going to be the fastest way to go, but if you can also try this:

    SELECT 
        COALESCE(
            Table1Count.[load_nr],
            Table2Count.[load_nr],
            Table3Count.[load_nr]       
        ) [load_nr],
        COALESCE(Table1Count.[rows],0) [s0_table1],
        COALESCE(Table2Count.[rows],0) [s0_table2],
        COALESCE(Table3Count.[rows],0) [s0_table3]
    FROM            (   SELECT  
                            [load_nr],
                            COUNT(1) [rows]
                        FROM        s0_table1
                        GROUP BY    [load_nr]
                    ) Table1Count
    FULL OUTER JOIN (   SELECT  
                            [load_nr],
                            COUNT(1) [rows]
                        FROM        s0_table2
                        GROUP BY    [load_nr]
                    ) Table2Count
    ON              Table2Count.load_nr = Table1Count.load_nr
    FULL OUTER JOIN (   SELECT  
                            [load_nr],
                            COUNT(1) [rows]
                        FROM        s0_table3
                        GROUP BY    [load_nr]
                    ) Table3Count
    ON              COALESCE(
                        Table1Count.[load_nr],
                        Table2Count.[load_nr]       
                    ) = Table3Count.load_nr
    

    Of course you’d need to build the query dynamically and the nested coalesces for each subsequent join could get a bit tricky.

    Another option if you have the load numbers you need in another table you can make it cleaner and easier to construct dynamically:

    SELECT 
        loads.[load_nr],
        (SELECT COUNT(1) FROM s0_table1 WHERE load_nr = loads.[load_nr]) [s0_table1Count],
        (SELECT COUNT(1) FROM s0_table2 WHERE load_nr = loads.[load_nr]) [s0_table2Count],
        (SELECT COUNT(1) FROM s0_table3 WHERE load_nr = loads.[load_nr]) [s0_table3Count],
        (SELECT COUNT(1) FROM s0_table4 WHERE load_nr = loads.[load_nr]) [s0_table4Count]
    FROM    loads
    

    And if you need to remove the loads with no rows in any table you can just wrap it in a subquery and add a where clause:

    SELECT 
        SubQuery.*
    FROM    (   SELECT 
                    loads.[load_nr],
                    (SELECT COUNT(1) FROM s0_table1 WHERE load_nr = loads.[load_nr]) [s0_table1Count],
                    (SELECT COUNT(1) FROM s0_table2 WHERE load_nr = loads.[load_nr]) [s0_table2Count],
                    (SELECT COUNT(1) FROM s0_table3 WHERE load_nr = loads.[load_nr]) [s0_table3Count],
                    (SELECT COUNT(1) FROM s0_table4 WHERE load_nr = loads.[load_nr]) [s0_table4Count]
                FROM    loads
            ) SubQuery
    WHERE   (       SubQuery.[s0_table1Count] <> 0
                OR  SubQuery.[s0_table2Count] <> 0
                OR  SubQuery.[s0_table3Count] <> 0
                OR  SubQuery.[s0_table4Count] <> 0
            )
    

    This could be pretty slow though depending on the shape of your data.

    Login or Signup to reply.
  3. I’m not sure if any of the following will work for you. This is something that
    I haven’t ever used in a real project, but maybe it will provide you with some
    inspiration. The final solution will get 99% to having the best case that you
    wanted (more on this at the end).

    Here’s the basic idea. psql provides a command called crosstabview that is
    able to display pivot tables, which is what you want. So, for your sample data,
    we could do this (notice that instead of terminating the query with a semicolon
    ;, we’re terminating it with crosstabview):

    select load_nr, 's0_table1', count(*) from s0_table1 group by load_nr
    union
    select load_nr, 's0_table2', count(*) from s0_table2 group by load_nr
    order by load_nr asc
    
    crosstabview
    
    ┌─────────┬───────────┬───────────┐
    │ load_nr │ s0_table1 │ s0_table2 │
    ├─────────┼───────────┼───────────┤
    │     101 │         2 │         1 │
    │     102 │         1 │         4 │
    │     103 │         3 │         1 │
    └─────────┴───────────┴───────────┘
    

    Try the same query with a semicolon terminator to see what the query has returned
    before psql post-processed the result:

    select load_nr, 's0_table1', count(*) from s0_table1 group by load_nr
    union
    select load_nr, 's0_table2', count(*) from s0_table2 group by load_nr
    order by load_nr asc
    
    ┌─────────┬───────────┬───────┐
    │ load_nr │ ?column?  │ count │
    ├─────────┼───────────┼───────┤
    │     101 │ s0_table1 │     2 │
    │     101 │ s0_table2 │     1 │
    │     102 │ s0_table1 │     1 │
    │     102 │ s0_table2 │     4 │
    │     103 │ s0_table2 │     1 │
    │     103 │ s0_table1 │     3 │
    └─────────┴───────────┴───────┘
    

    Ok. Now this was a psql thing, but it PostgreSQL provides a built-in
    extension called tablefunc which exposes a function called crosstab that
    does precisely what the psql crosstabview command does. Furthermore, this
    function accepts as argument a string containing the SQL text of the query
    you need, which means we could generate it.

    For the moment we can just use the hardcoded query (I’m using $$-delimited
    strings below so that I don’t have to escape single quotes):

    create extension if not exists tablefunc;
    
    select * from crosstab(
      -- The 1st argument is the query producing the values on the two dimentions:
      -- `load_nr` and table name.
      $$
      select load_nr, 's0_table1', count(*) from s0_table1 group by load_nr
      union
      select load_nr, 's0_table2', count(*) from s0_table2 group by load_nr
      order by load_nr asc
      $$,
    
      -- The 2nd argument must return the headers names, which in our case are
      -- the table names
      $$
      select * from (values ('s0_table1'), ('s0_table2')) as headers
      $$
    ) as (
      "load_nr" int,
      "count s0_table1" int,
      "count s0_table2" int
    );
    

    The above will return this, which reproduces perfectly the crosstabview
    result:

    ┌─────────┬─────────────────┬─────────────────┐
    │ load_nr │ count s0_table1 │ count s0_table2 │
    ├─────────┼─────────────────┼─────────────────┤
    │     101 │               2 │               1 │
    │     102 │               1 │               4 │
    │     103 │               3 │               1 │
    └─────────┴─────────────────┴─────────────────┘
    

    Ok, now on to dynamically generate the two SQL queries that crosstab wants.

    First, we can obtain a list of the table names prefixed with s0_:

    select
      relname
    from
      pg_catalog.pg_class
    where relname like 's0_%'
    order by relname asc;
    
    ┌───────────┐
    │  relname  │
    ├───────────┤
    │ s0_table1 │
    │ s0_table2 │
    └───────────┘
    

    Now, we can use these values to build individual SELECT statements that we’ll
    later join via UNION. First the SELECTs:

    select
      'SELECT load_nr, ''' || relname || ''', count(*) FROM ' || relname || ' GROUP BY load_nr' as sql_text
    from
      pg_catalog.pg_class
    where relname like 's0_%'
    order by relname asc;
    
    ┌───────────────────────────────────────────────────────────────────────┐
    │                               sql_text                                │
    ├───────────────────────────────────────────────────────────────────────┤
    │ SELECT load_nr, 's0_table1', count(*) FROM s0_table1 GROUP BY load_nr │
    │ SELECT load_nr, 's0_table2', count(*) FROM s0_table2 GROUP BY load_nr │
    └───────────────────────────────────────────────────────────────────────┘
    (2 rows)
    

    Now we can join the 2 rows into a single one:

    select string_agg(sql_text, E'nUNIONn') || E'nORDER BY load_nr ASC' from (
      select
        'SELECT load_nr, ''' || relname || ''', count(*) FROM ' || relname || ' GROUP BY load_nr' as sql_text
      from
        pg_catalog.pg_class
      where relname like 's0_%'
      order by relname asc
    ) as queries;
    
    ┌───────────────────────────────────────────────────────────────────────┐
    │                               ?column?                                │
    ├───────────────────────────────────────────────────────────────────────┤
    │ SELECT load_nr, 's0_table1', count(*) FROM s0_table1 GROUP BY load_nr↵│
    │ UNION                                                                ↵│
    │ SELECT load_nr, 's0_table2', count(*) FROM s0_table2 GROUP BY load_nr↵│
    │ ORDER BY load_nr ASC                                                  │
    └───────────────────────────────────────────────────────────────────────┘
    (1 row)
    

    If we take this query and manually run it using crosstabview, we’ll get:

    SELECT load_nr, 's0_table1', count(*) FROM s0_table1 GROUP BY load_nr
    UNION
    SELECT load_nr, 's0_table2', count(*) FROM s0_table2 GROUP BY load_nr
    ORDER BY load_nr ASC crosstabview
    
    ┌─────────┬───────────┬───────────┐
    │ load_nr │ s0_table1 │ s0_table2 │
    ├─────────┼───────────┼───────────┤
    │     101 │         2 │         1 │
    │     102 │         1 │         4 │
    │     103 │         3 │         1 │
    └─────────┴───────────┴───────────┘
    (3 rows)
    

    Nice, so this was precisely what we wanted. We can now use SQL-generating query
    as the first argument to the function crosstab:

    with
      source_sql as (
        select string_agg(sql_text, E'nUNIONn') || E'nORDER BY load_nr ASC' from (
          select
            'SELECT load_nr, ''' || relname || ''', count(*) FROM ' || relname || ' GROUP BY load_nr' as sql_text
          from
            pg_catalog.pg_class
          where relname like 's0_%'
          order by relname asc
        ) as queries
      )
      select * from crosstab(
        -- The 1st argument is the query producing the values on the two dimentions:
        -- `load_nr` and table name.
        (select * from source_sql),
    
        -- The 2nd argument must return the headers names, which in our case are
        -- the table names
        $$
        select * from (values ('s0_table1'), ('s0_table2')) as headers
        $$
      ) as (
        "load_nr" int,
        "count s0_table1" int,
        "count s0_table2" int
      );
    

    Executing it will get us:

    ┌─────────┬─────────────────┬─────────────────┐
    │ load_nr │ count s0_table1 │ count s0_table2 │
    ├─────────┼─────────────────┼─────────────────┤
    │     101 │               2 │               1 │
    │     102 │               1 │               4 │
    │     103 │               3 │               1 │
    └─────────┴─────────────────┴─────────────────┘
    (3 rows)
    

    Now, the final thing is to replace the 2nd argument to crosstab with a proper
    query — a way to fetch all table names prefixed with s0_, which we already
    know how to do — and with that the solution is complete:

    with
      source_sql as (
        select string_agg(sql_text, E'nUNIONn') || E'nORDER BY load_nr ASC' from (
          select
            'SELECT load_nr, ''' || relname || ''', count(*) FROM ' || relname || ' GROUP BY load_nr' as sql_text
          from
            pg_catalog.pg_class
          where relname like 's0_%'
          order by relname asc
        ) as queries
      )
    
      -- select * from source_sql;
    
      select * from public.crosstab(
        (select * from source_sql),
        $$
          select relname from pg_catalog.pg_class
          where relname like 's0_%'
          order by relname asc
        $$
      ) as (
        load_nr int,
        "count s0_table1" int,
        "count s0_table2" int
      );
    

    Coming back to my observation that this will bring you just 99% to your ideal
    solution. The reason for that is the fact that crosstab requires a column
    list definition, this part, where you have to manually list the table headers
    (again):

      ) as (
        load_nr int,
        "count s0_table1" int,
        "count s0_table2" int
      );
    

    Unfortunately, I don’t know of a way to get rid of that last hurdle. And as I
    mentioned in the beginning, this query might perform quite badly, but maybe it
    is good enough for your use case.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search