Top selling product per user_id and categorize by each user's buying activity (first-time, second-time, etc.) - Mysql

beth_9
March 16, 2023
205 views
2 votes
4 Answers

I’m working on MySQL workbench and I would like to find the top-selling products by types of buyers. First I categorized customers based on their buying activity. If a person made a purchase only on 1/1, it’s a first-time buyer. If a person made a purchase on 1/1 and 1/2, it’s a second-time buyer. If a person made a purchase on 1/1, 1/2, and 1/3, it’s a third-time buyer. And so on.

The following is sample data, similar to my original data.

user_id	PurchaseDate	Product	Total	order_id
1	2023-01-01 16:46	yogurt	3.72	1000
1	2023-01-01 18:23	milk	4.38	null
2	2023-01-01 12:34	egg	4.57	1001
1	2023-01-01 11:54	butter	3.69	null
3	2023-01-01 10:41	cheese	3.68	1003
2	2023-01-01 13:45	milk	4.38	null
3	2023-01-03 18:56	yogurt	3.72	1005
3	2023-01-02 19:12	cream	4.98	1004
2	2023-01-02 12:47	egg	4.57	1002

If a customer makes more than one purchase on the same date, the order_id is null.
So, I used that to categorize the buyers.

SELECT      user_id, product, sum(Total) as Total,
            CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
                 END AS TypeofBuyer
FROM    myData
GROUP BY user_id
ORDER BY Total DESC;

And I get the following result

user_id	Product	Total	TypeofBuyer
2	egg	13.52	SecondTimeBuyer
3	cheese	12.38	ThirdTimeBuyer
1	yogurt	11.79	FirstTimeBuyer

But the problem is since I aggregated the total and grouped it by user_id, I cannot find the top-selling product for each user_id.

For instance, for user_id 1, the top-selling product is milk and the customer is a first-time buyer.

How can I extract the above information?

I’d appreciate your help!

Edit: Eventually, I’m aiming to visualize the top 10 selling products for 1st-time buyers, 2nd-time buyers, and so forth (my original data is much bigger than the sample)

Answers

You can use a CTE.

With FavouriteProduct as
(
select user_id, 
Product As FavouriteProduct, 
count(Product) as BuyCount, 
sum(total) as TotalSpentFavourite 
From dbo.myData
group by user_id, Product
),

UserTotals as
(
select user_id, 
sum(total) as TotalSpent,
CASE 
    WHEN Count(DISTINCT(order_id)) = 1 THEN 'FirstTimeBuyer'
    WHEN Count(DISTINCT(order_id)) = 2 THEN 'SecondTimeBuyer' 
    WHEN Count(DISTINCT(order_id)) = 3 THEN 'ThirdTimeBuyer'
END AS TypeofBuyer
From dbo.myData
group by user_id
)

select DISTINCT  D.user_id, 
TypeofBuyer,
TotalSpent,
FP.FavouriteProduct,
FP.TotalSpentFavourite 
From dbo.myData D
outer apply
(
select top 1 * from FavouriteProduct FP where fp.user_id = D.user_id order by BuyCount desc
) FP
inner join UserTotals on UserTotals.user_id = D.user_id
order by user_id

Output:

user_id	TypeofBuyer	FavouriteProduct	TotalSpent	TotalSpentFavourite
1	FirstTimeBuyer	milk	11.79	4.38
2	SecondTimeBuyer	egg	13.52	9.14
3	ThirdTimeBuyer	cream	12.38	4.98

Try this:

/*first cte*/
WITH cte AS (
  SELECT user_id, Product, Total,
       ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY Total DESC) Rnum,
       SUM(Total) OVER (PARTITION BY user_id) AS GTotal,
       order_id
FROM myData),
/*second cte*/
  cte2 AS (
SELECT user_id,
       CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
                 END AS TypeofBuyer
 FROM cte
  GROUP BY user_id)
/*join the ctes*/
SELECT cte.user_id, cte.Product, cte.Total, cte.GTotal, cte2.TypeofBuyer
   FROM cte 
   JOIN cte2 ON cte.user_id=cte2.user_id
WHERE Rnum=1
ORDER BY GTotal DESC;

Using ROW_NUMBER() to get top sales and assign with row numbering and SUM() OVER () to get total sales by user_id in the first cte. Then re-using your method of getting TypeofBuyer in the second cte. Lastly, a query to join both of the ctes and filter only rows with row number=1 to get your desired result.

Demo fiddle

Since you didn’t specify which mysql version you are using, it’s possible you’re using a sub-8.0 one, in which window functions such as CTE are not supported. Therefore, I’m taking a version-neutral approach.
First of all, let’s get the top-selling product for each user_id.

select t1.user_id,t2.product
from
    (select user_id,max(sm) mx
    from
        (select user_id,product,sum(total) sm
        from mydata
        group by user_id,product) t
    group by user_id) t1
join
    (select user_id,product,sum(total) sm
    from mydata
    group by user_id,product) t2
on t1.user_id=t2.user_id and mx=sm
;

-- result set:
| user_id | product |
+---------+---------+
|       1 | milk    |
|       2 | egg     |
|       3 | cream   |
+---------+---------+

The rest is to join the previous query to your query.

select ta.user_id, tb.product, ta.Total, ta.TypeofBuyer
from
    (SELECT      user_id, sum(Total) as Total,
            CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
                 WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
                 END AS TypeofBuyer
    FROM    myData
    GROUP BY user_id) ta
join
    (select t1.user_id,t2.product
    from
        (select user_id,max(sm) mx
        from
            (select user_id,product,sum(total) sm
            from mydata
            group by user_id,product) t
        group by user_id) t1
    join
        (select user_id,product,sum(total) sm
        from mydata
        group by user_id,product) t2
    on t1.user_id=t2.user_id and mx=sm
    ) tb
on ta.user_id=tb.user_id
;
-- result set:
| user_id | product | Total | TypeofBuyer     |
+---------+---------+-------+-----------------+
|       1 | milk    | 11.79 | FirstTimeBuyer  |
|       2 | egg     | 13.52 | SecondTimeBuyer |
|       3 | cream   | 12.38 | ThirdTimeBuyer  |
+---------+---------+-------+-----------------+

- user1191247
- March 17, 2023 at 10:47 am
- 0 votes
0
Your current CASE operation is doing more work than it needs to:
```
CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer" 
     WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer" 
     WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
     END AS TypeofBuyer
```
Internally, this will require a separate temporary table for each COUNT(DISTINCT), even though it is essentially comparing the same value. Rewriting it like:
```
CASE COUNT(DISTINCT(order_id))
    WHEN 1 THEN "FirstTimeBuyer" 
    WHEN 2 THEN "SecondTimeBuyer" 
    WHEN 3 THEN "ThirdTimeBuyer"
END AS TypeofBuyer
```
requires only one temporary table to evaluate the three cases. It is not a significant overhead, but why do more work than is necessary?

The "removal" of the order_ids, for subsequent orders on the same day, breaks your relational model and I do not understand the intent. If it is just for the sake of calculating your TypeofBuyer, then perhaps COUNT(DISTINCT DATE(PurchaseDate)) is a better option.

Yet another way to retrieve the top-selling product is with a correlated subquery. As with many things in the SQL world, there are many ways to achieve the same end goal, and you should test with a real world representative dataset to see what works best with your data:
```
SELECT
    user_id,
    (
        SELECT Product
        FROM myData
        WHERE user_id = md.user_id
        GROUP BY Product
        ORDER BY SUM(Total) DESC
        LIMIT 1
    ) AS Product,
    SUM(Total) AS Total,
    CASE COUNT(DISTINCT(DATE(PurchaseDate)))
        WHEN 1 THEN 'FirstTimeBuyer' 
        WHEN 2 THEN 'SecondTimeBuyer' 
        WHEN 3 THEN 'ThirdTimeBuyer'
        ELSE 'SuperBuyer'
    END AS TypeofBuyer
FROM myData md
GROUP BY user_id
ORDER BY Total DESC;
```
Output:

user_id Product Total TypeofBuyer

2 egg 13.52 SecondTimeBuyer

3 cream 12.38 ThirdTimeBuyer

1 milk 11.79 FirstTimeBuyer

Here’s a db<>fiddle
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Top selling product per user_id and categorize by each user's buying activity (first-time, second-time, etc.) – Mysql

Answers