skip to Main Content

How do I write a query that counts (totals) the number of values in a group for data spread across three tables? For each reporter and report date, I’d like a count of the number of sightings where species codes is 10 or 20 (using IN because species group has lots of codes).

REPORTER TABLE
reporter_id | reporter_num
-----------------------------
1           | 1111
2           | 2222

REPORT TABLE
report_id | reporter_id | report_date
-------------------------------------
1         | 1           | 2022-09-05
2         | 1           | 2022-09-05
3         | 1           | 2022-09-05
4         | 1           | 2022-09-16
5         | 2           | 2022-09-22
6         | 2           | 2022-09-22

SIGHTING TABLE
sighting_id | species_code
------------------------
1           | 10          
2           | 55          
3           | 20          
4           | 35         
5           | 55 
6           | 20   

This is essentially what I’m working with when the three tables are joined:

reporter_num | report_date | species_code
----------------------------------------
1111         | 2022-09-05  | 10
1111         | 2022-09-05  | 55
1111         | 2022-09-05  | 20
1111         | 2022-09-16  | 35
2222         | 2022-09-22  | 55
2222         | 2022-09-22  | 20

Query: for each reporter_num and report_date (one row per reporter_num and report_date), count the number of sightings where species_code is 10 or 20. Expected results:

reporter_num | report_date | my_count
----------------------------------------
1111         | 2022-09-05  | 2
1111         | 2022-09-16  | 0
2222         | 2022-09-22  | 1

A count in my query gives the total number of records for each reporter_num and report_date which isn’t what I want:

select 
   reporter.reporter_num, report.report_date,
   count(sighting.species_code in (10, 20)) as my_count
from report
   inner join reporter on report.reporter_id = report.reporter_id
   inner join location on report.report_id = location.report_id
   inner join method on location.location_id = method.location_id
   left join sighting on method.method_id = sighting.method_id
group by
   reporter.reporter_num, report.report_date;

Query Results – my_count is total number of records, which is incorrect:

reporter_num | report_date | my_count
----------------------------------------
1111         | 2022-09-05  | 3
1111         | 2022-09-16  | 1
2222         | 2022-09-22  | 2

Tried a subquery and the counts in both the outer query and subquery are incorrect:

select
   reporter.reporter_num, report.report_date,
   count(my_table.my_count)
from report
   inner join reporter on report.reporter_id = reporter.reporter_id
   inner join (
      select 
         reporter.reporter_id, reporter.reporter_num,
         report.report_date, sighting.species_code as my_count
      from report
         [... see joins in above query ...]
      where sighting.species_code in (10, 20)
      ) as my_table on reporter.reporter_id = my_table.reporter_id
group by
   reporter.reporter_num, report.report_date;

I feel like I’m close but missing something (or a couple somethings). Any suggestions? Many thanks.

2

Answers


  1. Chosen as BEST ANSWER

    I wasn't using COUNT or COUNT(IF) correctly. The IF has to be 1 if true and NULL if false; otherwise, it will add 1 even if the species code isn't in the species group. The correct code:

    select 
       reporter.reporter_num, report.report_date,
       count(if(sighting.species_code in (10, 20), 1, NULL)) as my_count
    from report
       [... see joins in above query ...]
    group by
       reporter.reporter_num, report.report_date;
    

    The above will also return report dates that don't have any species in the species group, my_count will be zero for that case, which is what I want.

    To exclude report dates with zero species in the species group, use count(*) in SELECT and filter for the species codes in WHERE.


  2. SELECT reporter_num, report_date,
    LENGTH(gs) + 2 - LENGTH(REGEXP_REPLACE(CONCAT(',', gs, ','), ',(10|20),', 'len')) cnt
    FROM (
      SELECT reporter.reporter_num, report.report_date, GROUP_CONCAT(sighting.species_code) gs
        FROM report
        JOIN reporter ON report.reporter_id = reporter.reporter_id
        [... joins on location and method ...]
        JOIN sighting ON link.sighting_id = sighting.sighting_id
      GROUP BY reporter.reporter_num, report.report_date
    ) tbl;
    

    Outputs:

    | reporter_num | report_date | cnt |
    |--------------|-------------|-----|
    | 1111         | 2022-09-05  | 2   |
    | 1111         | 2022-09-16  | 0   |
    | 2222         | 2022-09-22  | 1   |
    

    First creating an ad hoc, temporary table using GROUP_CONCAT and GROUP BY:

    SELECT reporter.reporter_num, report.report_date, GROUP_CONCAT(sighting.species_code) gs
      FROM report
      JOIN reporter ON report.reporter_id = reporter.reporter_id
      [... joins on location and method ...]
      JOIN sighting ON link.sighting_id = sighting.sighting_id
    GROUP BY reporter.reporter_num, report.report_date
    

    Outputs:

    | reporter_num | report_date | gs       |
    |--------------|-------------|----------|
    | 1111         | 2022-09-05  | 10,55,20 |
    | 1111         | 2022-09-16  | 35       |
    | 2222         | 2022-09-22  | 55,20    |
    

    Then counting occurrences of either 10 or 20 in the GROUP_CONCAT‘ed gs column:

    LENGTH(gs) + 2 - LENGTH(REGEXP_REPLACE(CONCAT(',', gs, ','), ',(10|20),', 'len'))
    

    In this case using REGEXP_REPLACE instead of REPLACE as is used in the credited link.


    -- create
    CREATE TABLE reporter (
      reporter_id INTEGER PRIMARY KEY,
      reporter_num INTEGER NOT NULL
    );
    CREATE TABLE report (
      report_id INTEGER PRIMARY KEY,
      reporter_id INTEGER NOT NULL,
      report_date TEXT NOT NULL
    );
    CREATE TABLE report_sighting (
      report_id INTEGER NOT NULL,
      sighting_id INTEGER NOT NULL
    );
    CREATE TABLE link (
      report_id INTEGER NOT NULL,
      sighting_id INTEGER NOT NULL
    );
    CREATE TABLE sighting (
      sighting_id INTEGER PRIMARY KEY,
      species_code INTEGER NOT NULL
    );
    
    -- insert
    INSERT INTO reporter VALUES (1, 1111), (2, 2222);
    INSERT INTO report VALUES (1, 1, '2022-09-05'), (2, 1, '2022-09-05'), (3, 1, '2022-09-05'), (4, 1, '2022-09-16'),
      (5, 2, '2022-09-22'), (6, 2, '2022-09-22');
    INSERT INTO link VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6);
    INSERT INTO sighting VALUES (1, 10), (2, 55), (3, 20), (4, 35), (5, 55), (6, 20);
    
    
    -- fetch 
    SELECT reporter_num, report_date,
    LENGTH(gs) + 2 - LENGTH(REGEXP_REPLACE(CONCAT(',', gs, ','), ',(10|20),', 'len')) cnt
    FROM (
      SELECT reporter.reporter_num, report.report_date, GROUP_CONCAT(sighting.species_code) gs
        FROM report
        JOIN reporter ON report.reporter_id = reporter.reporter_id
        JOIN link ON report.report_id = link.report_id
        JOIN sighting ON link.sighting_id = sighting.sighting_id
      GROUP BY reporter.reporter_num, report.report_date
    ) tbl;
    

    Try it here: https://onecompiler.com/mysql/3ygv4cry2

    Credit: https://www.tutorialspoint.com/finding-number-of-occurrences-of-a-specific-string-in-mysql

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search