skip to Main Content

I have three tables in a MySQL database, and after joining two of them together, the COUNT() function in my query no longer works properly, and returns inflated numbers instead.

Please excuse my messy code as it is my first time using MySQL.
This is the query I have written which works:

SELECT 
    promo.ID,
    promo.Branch,
    promo.Promoter,
    COUNT(CASE WHEN sales.brand = 'TEST' AND sales.datesold BETWEEN '2023-01-01' AND '2023-12-31' AND promo.branch = sales.branch THEN 1 END) AS Actual, 
FROM sales.promo
    LEFT JOIN sales.sales ON promo.promoter = sales.promoter
GROUP BY promo.ID 
ORDER BY promo.ID ASC;

The COUNT() function here returns the correct number.

I have three tables: promo, sales and model.

What I wanted to do is expand it so that it could also filter the model details. The idea is that using promo.promoter, you can look up the sales data in sales.sales by matching the sales with the promoter. Then, using the sales.model column, you can find information about the relevant model in the model table. However, when I do

SELECT 
    promo.ID,
    promo.Branch,
    promo.Promoter,
    COUNT(CASE WHEN sales.brand = 'TEST' AND sales.datesold BETWEEN '2023-01-01' AND '2023-12-31' AND promo.branch = sales.branch THEN 1 END) AS Actual, 
    COUNT(CASE WHEN sales.brand = 'TEST' AND sales.datesold BETWEEN '2023-01-01' AND '2023-12-31' AND promo.branch = sales.branch AND model.SizeRange = 'Small' THEN 1 END) AS SmallActual 
FROM sales.promo
    LEFT JOIN sales.sales ON promo.promoter = sales.promoter
    LEFT JOIN sales.model ON sales.model = model.model
GROUP BY promo.ID 
ORDER BY promo.ID ASC;

it ends up inflating the values in the Actual and SmallActual column by a seemingly random number. How can I fix this?

Searching online, I’ve seen people talk about ‘table aggregation’ and whatnot to fix it, but I don’t understand how that is supposed to work. Does anyone know what’s going wrong here? Thanks!

==========

Edit: Here are my tables with some sample data:

Promoter table:

ID Branch Promoter
1 Sample John Smith

Sales table:

ID Promoter Dealer Branch Brand Model DateSold
1 John Smith Challenger Sample TEST C123 2023-02-25

Model Table:

ID Model Brand SizeRange SalesRangeCategory
1 C123 TEST Small Entry Level

Expected output:

ID Branch Promoter Actual SmallActual
1 Sample John Smith 91 26

Actual output:

ID Branch Promoter Actual SmallActual
1 Sample John Smith 106 39

2

Answers


  1. Firstly, you should use right formatting style. It makes your code readable and more comfortable for work.

    What about group by operator. You can’t use columns in select block without using them in grouping and without some aggregation function.

    In your task you can make subquery like this.

    SELECT 
        t1.ID,
        COUNT(
            CASE 
                WHEN t2.brand = 'TEST' 
                 AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
                 AND t1.branch = t2.branch 
                THEN 1 
            END
        ) AS Actual, 
        COUNT(
            CASE 
                WHEN t2.brand = 'TEST' 
                 AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
                 AND t1.branch = t2.branch 
                 AND t3.SizeRange = 'Small' 
                THEN 1 
            END
        ) AS SmallActual 
    FROM sales.promo t1
    LEFT JOIN sales.sales t2 
    ON t1.promoter = t2.promoter
    LEFT JOIN sales.model t3
    ON t2.model = t3.model
    GROUP BY t1.ID 
    ORDER BY t1.ID ASC;
    

    Do not forget table aliases. And if you want to add Branch, Promoter just use another join. Like this.

    SELECT 
        t1.Promoter,
        t1.Branch,
        t2.*
    FROM sales.promo t1 
    INNER JOIN (
        SELECT 
            t1.ID,
            COUNT(
                CASE 
                    WHEN t2.brand = 'TEST' 
                     AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
                     AND t1.branch = t2.branch 
                    THEN 1 
                END
            ) AS Actual, 
            COUNT(
                CASE 
                    WHEN t2.brand = 'TEST' 
                     AND t2.datesold BETWEEN '2023-01-01' AND '2023-12-31' 
                     AND t1.branch = t2.branch 
                     AND t3.SizeRange = 'Small' 
                    THEN 1 
                END
            ) AS SmallActual 
        FROM sales.promo t1
        LEFT JOIN sales.sales t2 
        ON t1.promoter = t2.promoter
        LEFT JOIN sales.model t3
        ON t2.model = t3.model
        GROUP BY t1.ID 
    ) t2 
    ON t1.id = t2.id
    ORDER BY t1.ID ASC;
    
    Login or Signup to reply.
  2. There must be something wrong with your data. The additional join should join the sale’s model. This cannot lead to a higher sales count, unless there are two rows in the models table for the same model.

    You should fix your data model to ensure data integrity:

    The sales table must not contain promoter, but the promoter.id. And it must not contain model, but the model.id. E.g.

    create table sales
    (
      id           int auto_increment primary key,
      promoter_id  int,
      model_id     int,
      ...,
      constraint fk_sales_promo foreign key (promoter_id) references promo(id)
      constraint fk_sales_model foreign key (model_id) references model(id)
    );
    

    The same should apply for other entities, like dealers, branches and brands.

    And make sure your tables contain the appropriate unique constraints:

    • A model in the model table must be unique.
    • A promoter in the promo table must be unique.
    • A sale in the sales table may also have to be unique.

    This would look like this:

    create table promo
    (
      ...,
      unique(promoter)
    );
    
    create table model
    (
      ...,
      unique(model)
    );
    
    create table sales
    (
      ...,
      unique(promoter_id, dealer_id, branch_id, brand_id, model_id, datesold)
    );
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search