skip to Main Content

I’m trying to build a flexible attribution model for site traffic. I’ve started out by making a last-click model, where a conversion is attributed to a channel of traffic based on the last visit a visitor made prior to converting. So:

Visitor     Visit      Channel  ConversionId
visitor a   visit 1    email    123
visitor a   visit 2    email    123
visitor a   visit 3    direct   123
visitor b   visit 1    seo      123
visitor b   visit 2    direct   123
visitor b   visit 3    email    123
visitor c   visit 1    seo      123
visitor c   visit 2    direct   123
visitor c   visit 3    direct   123

In the above example, I would count each visitor’s final visit and attribute the conversion to that channel. So, visitor a and visitor c’s conversions would be attributed to direct, while visitor b’s would be attributed to email.

Now I want to be able to exclude direct, and thus be able to attribute the conversion to the last non-direct channel. In this scenario, visitor a and visitor b’s conversions would be attributed to email (visitor a’s 3rd visit would be excluded), while visitor c’s conversion would be attributed to se (as c’s second and third visits would be excluded).

The way I have this set up so far is (and some of this looks a little goofy because I stripped some joins and identifying info out for simplicity):

WITH test (visitor,
           visit, --a number unique for each visitor row but not necessarily unique across all visits
           channel,
           conversionid,
           rn) AS
(
SELECT visitorid AS visitor,
       visitid AS visit,
       channel AS channel,
       conversionid AS conversionid
       rn = ROW_NUMBER() OVER (PARTITION BY conversionid ORDER BY visit DESC)
FROM db
GROUP BY visitorid,
         visitid,
         channel,
         conversionid)

SELECT visitor,
       MAX(visit) AS maxvisit,
     channel,
     conversionid
FROM test
WHERE rn = 1
GROUP BY visitor,
         channel
ORDER BY visitor;

This gives me last-click attribution. (There was an easier way for me to do this but that way didn’t look as flexible–I want to be able to easily change the query to exclude channels.) My question is, what do I do so I can exclude direct and attribute the conversion to the last non-direct channel? I’d also like to be able to exclude additional channels if needed so I can build different attribution models.

Thanks a million, guys.

3

Answers


  1. I think there is an easier way:

    SELECT db.*
    FROM (SELECT db.*,
                 ROW_NUMBER() OVER (PARTITION BY conversionid
                                    ORDER BY (CASE WHEN channel <> 'Direct' THEN 1 ELSE 2 END),
                                             visit DESC
                                   ) as seqnum
          FROM db
         ) db
    WHERE seqnum = 1;
    

    This will return the most recent record for each conversionid that is not Direct, if there is one.

    Login or Signup to reply.
  2. It is a little unclear what you want so here are some statistics examples that hopefully will make this a little easier for you to understand. This section is geared toward Conditional Aggregation which allows you to Mix and Match such as take the MAX VisitId regardless of conversion that is associated and get the last channel that isn’t direct.

    DECLARE @Table AS TABLE (visitor CHAR(1), VisitId INT, ConversionId INT, Channel VARCHAR(15))
    INSERT INTO @Table VALUES  ('a',1, 11,'email'),('a',2, 12,'email'),('a',3, 13,'direct')
    ,('b',4, 14,'seo'),('b',5, 15,'direct'),('b',6, 16,'email'),('c',7, 17,'seo')
    ,('c',8, 18,'direct'),('c',9, 19,'direct')
    
        ;WITH cte AS (
            SELECT
               visitor
               ,VisitId
               ,Channel
               ,ConversionId
               ,DirectConversion = CASE WHEN Channel = 'direct' THEN ConversionId END
               ,ChannelRowNumber = ROW_NUMBER() OVER (PARTITION BY visitor
                  ORDER BY
                     CASE WHEN Channel = 'direct' THEN 1 ELSE 0 END
                     ,ConversionId DESC)
    
            FROM
               @Table
        )
    
        SELECT
            visitor
            ,MaxVisitId = MAX(VisitId)
            ,MaxVisitIdOfNonDirect = MAX(CASE WHEN DirectConversion IS NULL THEN VisitId END)
            ,NumOfVisits = COUNT(DISTINCT VisitId)
            ,Channel = MAX(CASE WHEN ChannelRowNumber = 1 THEN Channel END)
            ,NumOfConversions = COUNT(DISTINCT ConversionId)
            ,NumOfDirectConversions = COUNT(DISTINCT DirectConversion)
            ,NumOfNonDirectConversions = COUNT(DISTINCT ConversionId) - COUNT(DISTINCT DirectConversion)
        FROM
            cte
        GROUP BY
            visitor
    

    If you just want to rule out Channel all together then using ROW_NUMBER and WHERE <> ‘direct’ should do the trick for you.

    Login or Signup to reply.
  3. In SQL Server the below code is worth trying

    select visitor,channel,conversionid,
    dense_rank() over(partition by visitor order by visit desc) as rn
    into #visitor
    from visitor
    where channel <> 'Direct'
    order by visitor,visit 
    
    select v1.visitor,count(v2.visit) as visits ,v1.channel,v1.conversionid 
    from #visitor v1 join visitor v2
    on v1.visitor=v2.visitor and  v1.rn=1
    group by v1.visitor,v1.channel,v1.conversionid 
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search