I’m trying to build a flexible attribution model for site traffic. I’ve started out by making a last-click model, where a conversion is attributed to a channel of traffic based on the last visit a visitor made prior to converting. So:
Visitor Visit Channel ConversionId
visitor a visit 1 email 123
visitor a visit 2 email 123
visitor a visit 3 direct 123
visitor b visit 1 seo 123
visitor b visit 2 direct 123
visitor b visit 3 email 123
visitor c visit 1 seo 123
visitor c visit 2 direct 123
visitor c visit 3 direct 123
In the above example, I would count each visitor’s final visit and attribute the conversion to that channel. So, visitor a and visitor c’s conversions would be attributed to direct, while visitor b’s would be attributed to email.
Now I want to be able to exclude direct, and thus be able to attribute the conversion to the last non-direct channel. In this scenario, visitor a and visitor b’s conversions would be attributed to email (visitor a’s 3rd visit would be excluded), while visitor c’s conversion would be attributed to se (as c’s second and third visits would be excluded).
The way I have this set up so far is (and some of this looks a little goofy because I stripped some joins and identifying info out for simplicity):
WITH test (visitor,
visit, --a number unique for each visitor row but not necessarily unique across all visits
channel,
conversionid,
rn) AS
(
SELECT visitorid AS visitor,
visitid AS visit,
channel AS channel,
conversionid AS conversionid
rn = ROW_NUMBER() OVER (PARTITION BY conversionid ORDER BY visit DESC)
FROM db
GROUP BY visitorid,
visitid,
channel,
conversionid)
SELECT visitor,
MAX(visit) AS maxvisit,
channel,
conversionid
FROM test
WHERE rn = 1
GROUP BY visitor,
channel
ORDER BY visitor;
This gives me last-click attribution. (There was an easier way for me to do this but that way didn’t look as flexible–I want to be able to easily change the query to exclude channels.) My question is, what do I do so I can exclude direct and attribute the conversion to the last non-direct channel? I’d also like to be able to exclude additional channels if needed so I can build different attribution models.
Thanks a million, guys.
3
Answers
I think there is an easier way:
This will return the most recent record for each
conversionid
that is notDirect
, if there is one.It is a little unclear what you want so here are some statistics examples that hopefully will make this a little easier for you to understand. This section is geared toward Conditional Aggregation which allows you to Mix and Match such as take the
MAX
VisitId
regardless ofconversion
that is associated and get the last channel that isn’t direct.If you just want to rule out Channel all together then using ROW_NUMBER and WHERE <> ‘direct’ should do the trick for you.
In SQL Server the below code is worth trying