I have a website running on Magento(1.9) and have a requirement to find out the average time a customer take to buy the same product again and the percentage of customers who have bought the same product.
Orders table(Sales_flat_order) – Contains the order details
Columns – order_id,Created_date,Customer_email
Items table(Sales_Flat_Order_item) – Items in a particular order,If a user buy 6 Items then the items table will have 6 entries with the same order_id.
Columns – items_id,Order_id(FK),item_sku(different for diff. Products)
Below is the sample data which i have joined orders and items tables for a specific customer filtered by email id.
For example the order_id “92224” there are 4 rows means there are 4 items purchased in that order and the item_sku refers the item name / its unique ID.
Now i want to find out the Average time taken for to buy the same product again.So in the order “98651” the user have purchased the product items_sku(“2996”) which was there in the first order.So the average time will be the date difference between those 2 orders.
Below are the difficulties i’m facing,
1) Since each order can contains multiple products how i can compare the row of items with the previous order items.I have tried with LIKE & ILIKE and it doesn’t seems to work and tried to use STR_AGG(item_sku,’,’) with LIKE but have not find any result.
2) Not only for a particular customer ,this need to be done for each and every customer.
Order_id Order_date Item_sku
92224 "2016-11-11 17:18:15" "3001"
92224 "2016-11-11 17:18:15" "2933"
92224 "2016-11-11 17:18:15" "2992-config"
92224 "2016-11-11 17:18:15" "2996"
94926 "2016-11-25 10:01:30" "3008"
98651 "2016-12-14 18:11:16" "2992-config"
98651 "2016-12-14 18:11:16" "2996"
99645 "2016-12-20 13:00:34" "3001"
99645 "2016-12-20 13:00:34" "2933"
114578 "2017-02-28 15:32:54" "2931"
114578 "2017-02-28 15:32:54" "0294"
114578 "2017-02-28 15:32:54" "0297"
114578 "2017-02-28 15:32:54" "0296"
114578 "2017-02-28 15:32:54" "2992-config"
131754 "2017-05-15 17:23:35" "3296"
131968 "2017-05-16 18:30:52" "3300"
133186 "2017-05-22 14:03:07" "3004"
133186 "2017-05-22 14:03:07" "3069-bio"
133186 "2017-05-22 14:03:07" "3421-bio-config"
137484 "2017-06-13 14:07:28" "3605"
137484 "2017-06-13 14:07:28" "3604"
141709 "2017-07-03 16:20:30" "3603"
141709 "2017-07-03 16:20:30" "3606"
141709 "2017-07-03 16:20:30" "2936"
141709 "2017-07-03 16:20:30" "3422-bio"
141977 "2017-07-04 16:31:37" "2936"
145194 "2017-07-17 15:22:41" "3603"
145194 "2017-07-17 15:22:41" "3604"
151651 "2017-08-11 11:43:38" "3809"
161334 "2017-09-14 15:53:30" "3670"
161334 "2017-09-14 15:53:30" "3604"
161334 "2017-09-14 15:53:30" "3603"
Any guidance will be of great help.I’m using PostgreSQL as my database.
2
Answers
I don’t know anything about magento, but this is how I would solve this in postgres, given your input data:
This is the average time, in seconds, between purchasing the same item. In the inner query, we use the lag window function, partitioned by item_sku to compare the current row to the previous entry for this item. The outer query just groups by the item_sku.
If you are adding multiple users, you would simply change the
partition by
clause topartition by user_id, item_sku
and then group by both user_id and item_sku.@Jeremy actually presents a good routine with the exception that the results is in elapsed seconds. However, that may not be the most useful format. Rather than using epoch you can use date subtraction and intervals directly.
The outer most query merely formats the the result to days and hours. Without it the results may appear (and do) to the fractional seconds.