skip to Main Content

I have these tables:

Product_variants as PV

CREATE TABLE `product_variants` (
  `id` int(11) UNSIGNED NOT NULL,
  `product_id` int(11) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Products as P

CREATE TABLE `products` (
  `id` int(11) UNSIGNED NOT NULL,
  `brand_id` int(11) UNSIGNED DEFAULT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Order_items as OI

CREATE TABLE `order_items` (
  `id` int(11) NOT NULL,
  `order_id` int(11) UNSIGNED NOT NULL,
  `product_variant_id` int(11) UNSIGNED DEFAULT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Also orders and brands, which I assume are not needed for this example.

I would like obtain 15 rows where there will be 5 best-selling brands and three variants of them, sorted by marketability of brands and variants.

Expected

product_variant_id    brand_id   number_brands_sold   number_variant_sold
1                     1          100                  50
2                     1          100                  30
3                     1          100                  10
4                     2          90                   40                      
5                     2          90                   25
6                     2          90                   10
7                     3          80                   45
8                     3          80                   20
9                     3          80                   5
10                    4          60                   25
11                    4          60                   20
12                    4          60                   8
13                    5          40                   10
14                    5          40                   9
15                    5          40                   8

I can get variants based on variants sold, but I can’t figure out how to incorporate brand sales into the query. Can it be that simple at all?

SELECT `pv`.`id` AS `product_variant_id`, `p`.`brand_id`, COUNT(DISTINCT `oi`.`order_id`) AS `number_variant_sold` 
FROM `product_variants` as pv
LEFT JOIN `products` AS `p` ON p.id = pv.product_id
LEFT JOIN `order_items` AS `oi` ON oi.product_variant_id = pv.id
GROUP BY `pv`.`id` 
ORDER BY `number_variant_sold` DESC 
LIMIT 15

This query only returns 1 in number_brand_sold and does not return correct results without DISTINCT. Additionally, it will always sort by number_brand_sold first and then number_variant_sold. I would need it to work according to the "Expected" table.

SELECT `pv`.`id` AS `product_variant_id`, `p`.`brand_id`, COUNT(`p`.`brand_id`) AS `number_brand_sold`, COUNT(DISTINCT `oi`.`order_id`) AS `number_variant_sold` 
FROM `product_variants` as pv
LEFT JOIN `products` AS `p` ON p.id = pv.product_id
LEFT JOIN `order_items` AS `oi` ON oi.product_variant_id = pv.id
GROUP BY `pv`.`id` 
ORDER BY `number_variant_sold` DESC 
LIMIT 15

EDIT:

I tried ask Chat GPT, but I did not get satisfactory answer. It seems that in mysql 5 it cannot be solved with one query. I couldn’t find a solution even by splitting it into 2 queries.

2

Answers


  1. This will be something near this:

    WITH
    cte1 AS (  -- count the amounts per variant and per brand
      SELECT brand_id, 
             product_variant_id, 
             COUNT(product_variant_id) number_variant_sold,
             SUM(COUNT(product_variant_id)) OVER (PARTITION BY brand_id) number_brands_sold
      FROM products
      JOIN product_variants ON products.id = product_variants.product_id
      JOIN order_items ON product_variants.id = order_items.product_variant_id
      GROUP BY brand_id, product_variant_id
      ),
    cte2 AS (  -- enumerate brands and variants within brand
      SELECT brand_id, 
             product_variant_id, 
             number_variant_sold,
             number_brands_sold,
             DENSE_RANK() OVER (ORDER BY number_brands_sold DESC) rn_within_brands,
             ROW_NUMBER() OVER (PARTITION BY brand_id
                                ORDER BY number_variant_sold DESC) rn_variant_within_brand
      FROM cte1
    )
    SELECT brand_id, 
           product_variant_id, 
           number_variant_sold,
           number_brands_sold
    FROM cte2
    WHERE rn_within_brands <= 5
      AND rn_variant_within_brand <= 3
    ORDER BY number_brands_sold DESC,
             number_variant_sold DESC
    

    Possible disadvantage – if 5th and 6th brands have the same total amount then both of them they will be returned.

    One more disadvantage – if 3rd and 4th variants have the same amount then one unpredictable of them will be returned.

    But there is no additional criteria in any of the above cases.

    Login or Signup to reply.
  2. You could do with some temporary tables.

    A script like this should collate the counts, store the top 5 brands and run 2 queries.

    -- build counts of product variants
    CREATE TEMPORARY TABLE PV_Order_Counts
    select product_variant_id, count(id) PV_Count from Order_Items
    GROUP BY product_varient_id
    
    -- sum up orders by brand
    CREATE TEMPORARY TABLE Prod_Order_Counts
    SELECT Product_Id, SUM(PV_Order_Counts.PV_Count) Prod_Count from Product_variants, @PV_Order_Counts
    WHERE PV_Order_Counts.product_variant_id = Product_vatiants.id
    GROUP BY Product_id
    ORDER BY Prod_Count
    LIMIT 5
    
    -- retrieve top 5 product brands
    SELECT Product_id, Brand_Id, Prod_Count from Prod_Order_Counts
    INNER JOIN Products on Prod_Order_Counts.Product_id=Products.id
    
    -- retrieve sales of product variants in top 5
    SELECT Products.id Product_Id, Product_Variants.id Product_Variant_Id, Products.Brand_Id, PV_Count from PV_Order_Counts
    INNER JOIN Product_Variants on PV_Order_Counts.product_variant_id = Product_Variants.id
    INNER JOIN Products on Product_Variants.Product_id=Products.id
    WHERE Products.id in (
      SELECT Product_id from @Prod_Order_Counts -- This temporary table is expected to have only 5 rows.
      )
    ORDER BY Product_Id, Product_Variant_Id
    
    -- Once finished, drop the tables.
    DROP TABLE PV_Order_Counts
    DROP TABLE Prod_Order_Counts
    

    At this point, it might be simpler to stop and to use the two query outputs separately. You could however, rewrite the last query:

    SELECT Products.id Product_Id, Product_Variants.id Product_Variant_Id, Products.Brand_Id, Prod_Count, PV_Count from PV_Order_Counts
    INNER JOIN Product_Variants on PV_Order_Counts.product_variant_id = Product_Variants.id
    INNER JOIN Products on Product_Variants.Product_id=Products.id
    INNER JOIN @Prod_Order_Counts ON Products.id = Prod_Order_Counts.Product_id -- This temporary table is expected to have only 5 rows.
    Order BY Prod_Count DESC, PV_Count DESC
    

    This still does not impose your 3 variant limit.

    For that, you could create a third temporary table:

    CREATE TEMPORARY TABLE Top_Prod_Output (
       'Product_Id' int(11) UNSIGNED,
       'Product_Variant_Id' int(11) UNSIGNED,
       'Brand_Id' int(11) UNSIGNED,
       'Prod_Count'  int(11) UNSIGNED,
       'PV_Count'  int(11) UNSIGNED)
    

    Then iterate through results:

    DECLARE @Product_Id, @Prod_Count INT;
    DECLARE curP CURSOR FOR SELECT Product_Id from Prod_Order_Counts;
    OPEN curP;
    
    LOOP
      FETCH curP INTO @Product_Id, @Prod_Count
      
      INSERT INTO Top_Product_Output
      SELECT @Product_Id, Product_Variants.id, Products.Brand_Id, @Prod_Count, PV_Count from PV_Order_Counts
      INNER JOIN Product_Variants on PV_Order_Counts.product_variant_id = Product_Variants.id
      INNER JOIN Products on Product_Variants.Product_id=Products.id
      WHERE Products.id = @Product_Id
      LIMIT 3
      
    END LOOP;
    
    -- Extract the results
    SELECT * from Top_Product_Output
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search