skip to Main Content

I’m working on MySQL workbench and I would like to find the top-selling products by types of buyers. First I categorized customers based on their buying activity. If a person made a purchase only on 1/1, it’s a first-time buyer. If a person made a purchase on 1/1 and 1/2, it’s a second-time buyer. If a person made a purchase on 1/1, 1/2, and 1/3, it’s a third-time buyer. And so on.

The following is sample data, similar to my original data.

user_id PurchaseDate Product Total order_id
1 2023-01-01 16:46 yogurt 3.72 1000
1 2023-01-01 18:23 milk 4.38 null
2 2023-01-01 12:34 egg 4.57 1001
1 2023-01-01 11:54 butter 3.69 null
3 2023-01-01 10:41 cheese 3.68 1003
2 2023-01-01 13:45 milk 4.38 null
3 2023-01-03 18:56 yogurt 3.72 1005
3 2023-01-02 19:12 cream 4.98 1004
2 2023-01-02 12:47 egg 4.57 1002

If a customer makes more than one purchase on the same date, the order_id is null.
So, I used that to categorize the buyers.

SELECT      user_id, product, sum(Total) as Total,
            CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
                 END AS TypeofBuyer
FROM    myData
GROUP BY user_id
ORDER BY Total DESC;

And I get the following result

user_id Product Total TypeofBuyer
2 egg 13.52 SecondTimeBuyer
3 cheese 12.38 ThirdTimeBuyer
1 yogurt 11.79 FirstTimeBuyer

But the problem is since I aggregated the total and grouped it by user_id, I cannot find the top-selling product for each user_id.

For instance, for user_id 1, the top-selling product is milk and the customer is a first-time buyer.

How can I extract the above information?

I’d appreciate your help!

Edit: Eventually, I’m aiming to visualize the top 10 selling products for 1st-time buyers, 2nd-time buyers, and so forth (my original data is much bigger than the sample)

4

Answers


  1. You can use a CTE.

    With FavouriteProduct as
    (
    select user_id, 
    Product As FavouriteProduct, 
    count(Product) as BuyCount, 
    sum(total) as TotalSpentFavourite 
    From dbo.myData
    group by user_id, Product
    ),
    
    UserTotals as
    (
    select user_id, 
    sum(total) as TotalSpent,
    CASE 
        WHEN Count(DISTINCT(order_id)) = 1 THEN 'FirstTimeBuyer'
        WHEN Count(DISTINCT(order_id)) = 2 THEN 'SecondTimeBuyer' 
        WHEN Count(DISTINCT(order_id)) = 3 THEN 'ThirdTimeBuyer'
    END AS TypeofBuyer
    From dbo.myData
    group by user_id
    )
    
    select DISTINCT  D.user_id, 
    TypeofBuyer,
    TotalSpent,
    FP.FavouriteProduct,
    FP.TotalSpentFavourite 
    From dbo.myData D
    outer apply
    (
    select top 1 * from FavouriteProduct FP where fp.user_id = D.user_id order by BuyCount desc
    ) FP
    inner join UserTotals on UserTotals.user_id = D.user_id
    order by user_id
    

    Output:

    user_id TypeofBuyer FavouriteProduct TotalSpent TotalSpentFavourite
    1 FirstTimeBuyer milk 11.79 4.38
    2 SecondTimeBuyer egg 13.52 9.14
    3 ThirdTimeBuyer cream 12.38 4.98
    Login or Signup to reply.
  2. Try this:

    /*first cte*/
    WITH cte AS (
      SELECT user_id, Product, Total,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY Total DESC) Rnum,
           SUM(Total) OVER (PARTITION BY user_id) AS GTotal,
           order_id
    FROM myData),
    /*second cte*/
      cte2 AS (
    SELECT user_id,
           CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
                     WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
                     WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
                     END AS TypeofBuyer
     FROM cte
      GROUP BY user_id)
    /*join the ctes*/
    SELECT cte.user_id, cte.Product, cte.Total, cte.GTotal, cte2.TypeofBuyer
       FROM cte 
       JOIN cte2 ON cte.user_id=cte2.user_id
    WHERE Rnum=1
    ORDER BY GTotal DESC;
    

    Using ROW_NUMBER() to get top sales and assign with row numbering and SUM() OVER () to get total sales by user_id in the first cte. Then re-using your method of getting TypeofBuyer in the second cte. Lastly, a query to join both of the ctes and filter only rows with row number=1 to get your desired result.

    Demo fiddle

    Login or Signup to reply.
  3. Since you didn’t specify which mysql version you are using, it’s possible you’re using a sub-8.0 one, in which window functions such as CTE are not supported. Therefore, I’m taking a version-neutral approach.
    First of all, let’s get the top-selling product for each user_id.

    select t1.user_id,t2.product
    from
        (select user_id,max(sm) mx
        from
            (select user_id,product,sum(total) sm
            from mydata
            group by user_id,product) t
        group by user_id) t1
    join
        (select user_id,product,sum(total) sm
        from mydata
        group by user_id,product) t2
    on t1.user_id=t2.user_id and mx=sm
    ;
    
    -- result set:
    | user_id | product |
    +---------+---------+
    |       1 | milk    |
    |       2 | egg     |
    |       3 | cream   |
    +---------+---------+
    

    The rest is to join the previous query to your query.

    select ta.user_id, tb.product, ta.Total, ta.TypeofBuyer
    from
        (SELECT      user_id, sum(Total) as Total,
                CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
                     WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
                     WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
                     END AS TypeofBuyer
        FROM    myData
        GROUP BY user_id) ta
    join
        (select t1.user_id,t2.product
        from
            (select user_id,max(sm) mx
            from
                (select user_id,product,sum(total) sm
                from mydata
                group by user_id,product) t
            group by user_id) t1
        join
            (select user_id,product,sum(total) sm
            from mydata
            group by user_id,product) t2
        on t1.user_id=t2.user_id and mx=sm
        ) tb
    on ta.user_id=tb.user_id
    ;
    -- result set:
    | user_id | product | Total | TypeofBuyer     |
    +---------+---------+-------+-----------------+
    |       1 | milk    | 11.79 | FirstTimeBuyer  |
    |       2 | egg     | 13.52 | SecondTimeBuyer |
    |       3 | cream   | 12.38 | ThirdTimeBuyer  |
    +---------+---------+-------+-----------------+
    
    Login or Signup to reply.
  4. Your current CASE operation is doing more work than it needs to:

    CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
         WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
         WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
         END AS TypeofBuyer
    

    Internally, this will require a separate temporary table for each COUNT(DISTINCT), even though it is essentially comparing the same value. Rewriting it like:

    CASE COUNT(DISTINCT(order_id))
        WHEN 1 THEN "FirstTimeBuyer" 
        WHEN 2 THEN "SecondTimeBuyer" 
        WHEN 3 THEN "ThirdTimeBuyer"
    END AS TypeofBuyer
    

    requires only one temporary table to evaluate the three cases. It is not a significant overhead, but why do more work than is necessary?

    The "removal" of the order_ids, for subsequent orders on the same day, breaks your relational model and I do not understand the intent. If it is just for the sake of calculating your TypeofBuyer, then perhaps COUNT(DISTINCT DATE(PurchaseDate)) is a better option.

    Yet another way to retrieve the top-selling product is with a correlated subquery. As with many things in the SQL world, there are many ways to achieve the same end goal, and you should test with a real world representative dataset to see what works best with your data:

    SELECT
        user_id,
        (
            SELECT Product
            FROM myData
            WHERE user_id = md.user_id
            GROUP BY Product
            ORDER BY SUM(Total) DESC
            LIMIT 1
        ) AS Product,
        SUM(Total) AS Total,
        CASE COUNT(DISTINCT(DATE(PurchaseDate)))
            WHEN 1 THEN 'FirstTimeBuyer' 
            WHEN 2 THEN 'SecondTimeBuyer' 
            WHEN 3 THEN 'ThirdTimeBuyer'
            ELSE 'SuperBuyer'
        END AS TypeofBuyer
    FROM myData md
    GROUP BY user_id
    ORDER BY Total DESC;
    

    Output:

    user_id Product Total TypeofBuyer
    2 egg 13.52 SecondTimeBuyer
    3 cream 12.38 ThirdTimeBuyer
    1 milk 11.79 FirstTimeBuyer

    Here’s a db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search