Mysql - Count records with values that belong to a set or group

cheesypoofs
September 23, 2022
218 views
0 votes
2 Answers

How do I write a query that counts (totals) the number of values in a group for data spread across three tables? For each reporter and report date, I’d like a count of the number of sightings where species codes is 10 or 20 (using IN because species group has lots of codes).

REPORTER TABLE
reporter_id | reporter_num
-----------------------------
1           | 1111
2           | 2222

REPORT TABLE
report_id | reporter_id | report_date
-------------------------------------
1         | 1           | 2022-09-05
2         | 1           | 2022-09-05
3         | 1           | 2022-09-05
4         | 1           | 2022-09-16
5         | 2           | 2022-09-22
6         | 2           | 2022-09-22

SIGHTING TABLE
sighting_id | species_code
------------------------
1           | 10          
2           | 55          
3           | 20          
4           | 35         
5           | 55 
6           | 20

This is essentially what I’m working with when the three tables are joined:

reporter_num | report_date | species_code
----------------------------------------
1111         | 2022-09-05  | 10
1111         | 2022-09-05  | 55
1111         | 2022-09-05  | 20
1111         | 2022-09-16  | 35
2222         | 2022-09-22  | 55
2222         | 2022-09-22  | 20

Query: for each reporter_num and report_date (one row per reporter_num and report_date), count the number of sightings where species_code is 10 or 20. Expected results:

reporter_num | report_date | my_count
----------------------------------------
1111         | 2022-09-05  | 2
1111         | 2022-09-16  | 0
2222         | 2022-09-22  | 1

A count in my query gives the total number of records for each reporter_num and report_date which isn’t what I want:

select 
   reporter.reporter_num, report.report_date,
   count(sighting.species_code in (10, 20)) as my_count
from report
   inner join reporter on report.reporter_id = report.reporter_id
   inner join location on report.report_id = location.report_id
   inner join method on location.location_id = method.location_id
   left join sighting on method.method_id = sighting.method_id
group by
   reporter.reporter_num, report.report_date;

Query Results – my_count is total number of records, which is incorrect:

reporter_num | report_date | my_count
----------------------------------------
1111         | 2022-09-05  | 3
1111         | 2022-09-16  | 1
2222         | 2022-09-22  | 2

Tried a subquery and the counts in both the outer query and subquery are incorrect:

select
   reporter.reporter_num, report.report_date,
   count(my_table.my_count)
from report
   inner join reporter on report.reporter_id = reporter.reporter_id
   inner join (
      select 
         reporter.reporter_id, reporter.reporter_num,
         report.report_date, sighting.species_code as my_count
      from report
         [... see joins in above query ...]
      where sighting.species_code in (10, 20)
      ) as my_table on reporter.reporter_id = my_table.reporter_id
group by
   reporter.reporter_num, report.report_date;

I feel like I’m close but missing something (or a couple somethings). Any suggestions? Many thanks.

Tags: mysql

Answers

Chosen as BEST ANSWER
- cheesypoofs
- September 27, 2022 at 6:23 am
- 0 votes
0
I wasn't using COUNT or COUNT(IF) correctly. The IF has to be 1 if true and NULL if false; otherwise, it will add 1 even if the species code isn't in the species group. The correct code:
```
select 
   reporter.reporter_num, report.report_date,
   count(if(sighting.species_code in (10, 20), 1, NULL)) as my_count
from report
   [... see joins in above query ...]
group by
   reporter.reporter_num, report.report_date;
```
The above will also return report dates that don't have any species in the species group, my_count will be zero for that case, which is what I want.

To exclude report dates with zero species in the species group, use count(*) in SELECT and filter for the species codes in WHERE.

(Edit)

SELECT reporter_num, report_date,
LENGTH(gs) + 2 - LENGTH(REGEXP_REPLACE(CONCAT(',', gs, ','), ',(10|20),', 'len')) cnt
FROM (
  SELECT reporter.reporter_num, report.report_date, GROUP_CONCAT(sighting.species_code) gs
    FROM report
    JOIN reporter ON report.reporter_id = reporter.reporter_id
    [... joins on location and method ...]
    JOIN sighting ON link.sighting_id = sighting.sighting_id
  GROUP BY reporter.reporter_num, report.report_date
) tbl;

Outputs:

| reporter_num | report_date | cnt |
|--------------|-------------|-----|
| 1111         | 2022-09-05  | 2   |
| 1111         | 2022-09-16  | 0   |
| 2222         | 2022-09-22  | 1   |

First creating an ad hoc, temporary table using GROUP_CONCAT and GROUP BY:

SELECT reporter.reporter_num, report.report_date, GROUP_CONCAT(sighting.species_code) gs
  FROM report
  JOIN reporter ON report.reporter_id = reporter.reporter_id
  [... joins on location and method ...]
  JOIN sighting ON link.sighting_id = sighting.sighting_id
GROUP BY reporter.reporter_num, report.report_date

Outputs:

| reporter_num | report_date | gs       |
|--------------|-------------|----------|
| 1111         | 2022-09-05  | 10,55,20 |
| 1111         | 2022-09-16  | 35       |
| 2222         | 2022-09-22  | 55,20    |

Then counting occurrences of either 10 or 20 in the GROUP_CONCAT‘ed gs column:

LENGTH(gs) + 2 - LENGTH(REGEXP_REPLACE(CONCAT(',', gs, ','), ',(10|20),', 'len'))

In this case using REGEXP_REPLACE instead of REPLACE as is used in the credited link.

-- create
CREATE TABLE reporter (
  reporter_id INTEGER PRIMARY KEY,
  reporter_num INTEGER NOT NULL
);
CREATE TABLE report (
  report_id INTEGER PRIMARY KEY,
  reporter_id INTEGER NOT NULL,
  report_date TEXT NOT NULL
);
CREATE TABLE report_sighting (
  report_id INTEGER NOT NULL,
  sighting_id INTEGER NOT NULL
);
CREATE TABLE link (
  report_id INTEGER NOT NULL,
  sighting_id INTEGER NOT NULL
);
CREATE TABLE sighting (
  sighting_id INTEGER PRIMARY KEY,
  species_code INTEGER NOT NULL
);

-- insert
INSERT INTO reporter VALUES (1, 1111), (2, 2222);
INSERT INTO report VALUES (1, 1, '2022-09-05'), (2, 1, '2022-09-05'), (3, 1, '2022-09-05'), (4, 1, '2022-09-16'),
  (5, 2, '2022-09-22'), (6, 2, '2022-09-22');
INSERT INTO link VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6);
INSERT INTO sighting VALUES (1, 10), (2, 55), (3, 20), (4, 35), (5, 55), (6, 20);


-- fetch 
SELECT reporter_num, report_date,
LENGTH(gs) + 2 - LENGTH(REGEXP_REPLACE(CONCAT(',', gs, ','), ',(10|20),', 'len')) cnt
FROM (
  SELECT reporter.reporter_num, report.report_date, GROUP_CONCAT(sighting.species_code) gs
    FROM report
    JOIN reporter ON report.reporter_id = reporter.reporter_id
    JOIN link ON report.report_id = link.report_id
    JOIN sighting ON link.sighting_id = sighting.sighting_id
  GROUP BY reporter.reporter_num, report.report_date
) tbl;

Try it here: https://onecompiler.com/mysql/3ygv4cry2

Credit: https://www.tutorialspoint.com/finding-number-of-occurrences-of-a-specific-string-in-mysql