Joining another table in MySQL inflates count number

Kasreti
July 5, 2023
149 views
0 votes
2 Answers

I have three tables in a MySQL database, and after joining two of them together, the COUNT() function in my query no longer works properly, and returns inflated numbers instead.

Please excuse my messy code as it is my first time using MySQL.
This is the query I have written which works:

SELECT 
    promo.ID,
    promo.Branch,
    promo.Promoter,
    COUNT(CASE WHEN sales.brand = 'TEST' AND sales.datesold BETWEEN '2023-01-01' AND '2023-12-31' AND promo.branch = sales.branch THEN 1 END) AS Actual, 
FROM sales.promo
    LEFT JOIN sales.sales ON promo.promoter = sales.promoter
GROUP BY promo.ID 
ORDER BY promo.ID ASC;

The COUNT() function here returns the correct number.

I have three tables: promo, sales and model.

What I wanted to do is expand it so that it could also filter the model details. The idea is that using promo.promoter, you can look up the sales data in sales.sales by matching the sales with the promoter. Then, using the sales.model column, you can find information about the relevant model in the model table. However, when I do

SELECT 
    promo.ID,
    promo.Branch,
    promo.Promoter,
    COUNT(CASE WHEN sales.brand = 'TEST' AND sales.datesold BETWEEN '2023-01-01' AND '2023-12-31' AND promo.branch = sales.branch THEN 1 END) AS Actual, 
    COUNT(CASE WHEN sales.brand = 'TEST' AND sales.datesold BETWEEN '2023-01-01' AND '2023-12-31' AND promo.branch = sales.branch AND model.SizeRange = 'Small' THEN 1 END) AS SmallActual 
FROM sales.promo
    LEFT JOIN sales.sales ON promo.promoter = sales.promoter
    LEFT JOIN sales.model ON sales.model = model.model
GROUP BY promo.ID 
ORDER BY promo.ID ASC;

it ends up inflating the values in the Actual and SmallActual column by a seemingly random number. How can I fix this?

Searching online, I’ve seen people talk about ‘table aggregation’ and whatnot to fix it, but I don’t understand how that is supposed to work. Does anyone know what’s going wrong here? Thanks!

==========

Edit: Here are my tables with some sample data:

Promoter table:

ID	Branch	Promoter
1	Sample	John Smith

Sales table:

ID	Promoter	Dealer	Branch	Brand	Model	DateSold
1	John Smith	Challenger	Sample	TEST	C123	2023-02-25

Model Table:

ID	Model	Brand	SizeRange	SalesRangeCategory
1	C123	TEST	Small	Entry Level

Expected output:

ID	Branch	Promoter	Actual	SmallActual
1	Sample	John Smith	91	26

Actual output:

ID	Branch	Promoter	Actual	SmallActual
1	Sample	John Smith	106	39

Answers

Firstly, you should use right formatting style. It makes your code readable and more comfortable for work.

What about group by operator. You can’t use columns in select block without using them in grouping and without some aggregation function.

In your task you can make subquery like this.

SELECT 
    t1.ID,
    COUNT(
        CASE 
            WHEN t2.brand = 'TEST' 
             AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
             AND t1.branch = t2.branch 
            THEN 1 
        END
    ) AS Actual, 
    COUNT(
        CASE 
            WHEN t2.brand = 'TEST' 
             AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
             AND t1.branch = t2.branch 
             AND t3.SizeRange = 'Small' 
            THEN 1 
        END
    ) AS SmallActual 
FROM sales.promo t1
LEFT JOIN sales.sales t2 
ON t1.promoter = t2.promoter
LEFT JOIN sales.model t3
ON t2.model = t3.model
GROUP BY t1.ID 
ORDER BY t1.ID ASC;

Do not forget table aliases. And if you want to add Branch, Promoter just use another join. Like this.

SELECT 
    t1.Promoter,
    t1.Branch,
    t2.*
FROM sales.promo t1 
INNER JOIN (
    SELECT 
        t1.ID,
        COUNT(
            CASE 
                WHEN t2.brand = 'TEST' 
                 AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
                 AND t1.branch = t2.branch 
                THEN 1 
            END
        ) AS Actual, 
        COUNT(
            CASE 
                WHEN t2.brand = 'TEST' 
                 AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
                 AND t1.branch = t2.branch 
                 AND t3.SizeRange = 'Small' 
                THEN 1 
            END
        ) AS SmallActual 
    FROM sales.promo t1
    LEFT JOIN sales.sales t2 
    ON t1.promoter = t2.promoter
    LEFT JOIN sales.model t3
    ON t2.model = t3.model
    GROUP BY t1.ID 
) t2 
ON t1.id = t2.id
ORDER BY t1.ID ASC;

- ThorstenKettner
- July 5, 2023 at 8:29 am
- 0 votes
0
There must be something wrong with your data. The additional join should join the sale’s model. This cannot lead to a higher sales count, unless there are two rows in the models table for the same model.

You should fix your data model to ensure data integrity:

The sales table must not contain promoter, but the promoter.id. And it must not contain model, but the model.id. E.g.
```
create table sales
(
  id           int auto_increment primary key,
  promoter_id  int,
  model_id     int,
  ...,
  constraint fk_sales_promo foreign key (promoter_id) references promo(id)
  constraint fk_sales_model foreign key (model_id) references model(id)
);
```
The same should apply for other entities, like dealers, branches and brands.

And make sure your tables contain the appropriate unique constraints:
- A model in the model table must be unique.
- A promoter in the promo table must be unique.
- A sale in the sales table may also have to be unique.
This would look like this:
```
create table promo
(
  ...,
  unique(promoter)
);

create table model
(
  ...,
  unique(model)
);

create table sales
(
  ...,
  unique(promoter_id, dealer_id, branch_id, brand_id, model_id, datesold)
);
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.