in the example data set above, I want to assign a Primary_Store for each buyer_id based on the following criteria
- if one store has more visit counts than any other store for that buyer, assign as primary
- if no, then
—- > if stores with most visits are equal, assign Primary_Store according to hierarchy of store_type: 1. Retail, 2.Online, 3.Event
————> if stores with most visits are equal, and store_type is the same, then assign Primary_Store as store with earliest visit date
By this logic the results should be:
Buyer ID : Primary_Store
1001: Whole Foods
1002: Farmer’s Market
1003: Costco
I have been trying to use a case when statement but i am having difficulty applying the different types of criteria at once.
2
Answers
So the simple way is to use a RANKING window function of which ROW_NUMBER is one, and it does not allocate duplicate rank values.
thus:
for this partial copy of your data, we can use three priorities in the ranking ORDER BY
thus it can be moved to a QUALIFY like and we can take only the first value:
gives:
So to reflect Rajat’s answer of using FIRST_VALUE to retrieve the "primary store" as a value, which opens the idea, that you might also just want to know "is this the one". which can be done with:
the basic answer, of how can you rank the data, is the same between the methods, it then comes down to how you wish to use that rank.
You seem to be looking for way to assign a primary store to existing rows without collapsing them. We can modify Simeon’s solution a little bit. I swapped
decode
out forarray_position
for personal preference but you can go with either of the methods