skip to Main Content

I have a table that contains students publications like this

id student
1 john
2 anthony
3 steven
4 lucille
5 anthony
6 steven
7 john
8 lucille
9 john
10 anthony
11 steven
12 lucille
13 john

so the idea is about to have a query that fetchs all ordered occurences of a determinated student names

context :

answer to the question : how many times John is publishing just after Anthony (who is publishing just after Steven …) and get id of each occurence

example :

If I look for all occurences of [john, anthony] I’ll get (note that the ids must be successive for each occurence)

id student
1 john
2 anthony
9 john
10 anthony

Or :

id — comment
1 (id of first occurence of john, anthony)
9 (id of second occurence of john, anthony)

If I look for [anthony, steven, lucille] i’ll get

id student
2 anthony
3 steven
4 lucille
10 anthony
11 steven
12 lucille

Or :

id — comment
2 (id of first occurence of anthony, steven, lucille)
10 (id of second occurence of anthony, steven, lucille)

Any ideas or leads to help me move forward?

2

Answers


  1. Start with this and if it explodes we’ll do some performance improvements, with the price of making the code a little bit more complicated.

    with
     prm(students) as (select 'anthony,steven,lucille')
    ,prm_ext(students_regex) as (select regexp_replace(students, '^|(,)','1d+;', 'g') from prm)
    select  split_part(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select students_regex from prm_ext), 'g')), ';', 1)::int  as id
    from    t
    
    id
    2
    10
    with
     prm(students) as (select 'anthony,steven,lucille')
    ,prm_ext(students_regex) as (select regexp_replace(students, '^|(,)','1d+;', 'g') from prm)
    select  cols[1]::int   as id
           ,cols[2]::text  as student
           
    from   (select  string_to_array(string_to_table(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select students_regex from prm_ext), 'g')), ','), ';') as cols
            from    t
           ) t
    
    id student
    2 anthony
    3 steven
    4 lucille
    10 anthony
    11 steven
    12 lucille

    Fiddle

    Login or Signup to reply.
  2. That should do the trick, performance wise.

    The main idea is to split the data by the first student that is in our search list, but not in all places –
    Since the same student can appear multiple times in our search list, we need to make sure that we’re not breaking the pattern in the middle.
    We’re doing that by verifying that each occurrence of the first student is far enough from its previous occurrence, that is, the distance between the two occurrences is bigger than the search list length (the number of non-unique students’ names within the search list)

    with
     prm(students) as (select 'anthony,steven,lucille,anthony')
    ,prm_ext(search_pattern, first_student, tokens_num) as 
     (
        select  regexp_replace(students, '^|(,)','1d+;', 'g') as search_pattern
               ,split_part(students, ',', 1)                    as first_student
               ,array_length(string_to_array(students, ','), 1) as tokens_num
               
        from    prm
    )
    ,prev_student as
    (
        select  id
               ,student
               ,lag(id) over (partition by student order by id) as student_prev_id
        
        from    t
    )
    ,seq as
    (
        select  id
               ,student
               ,sum(case when student = p.first_student and coalesce(id - student_prev_id >= p.tokens_num, true) then 1 end) over (order by id) as seq_id
               ,id - max(case when student = p.first_student then id end) over (order by id) as distance_from_first_student
        
        from    prev_student cross join prm_ext as p
        
        order by id
    )
    select    split_part(unnest(regexp_matches(string_agg(id || ';' || student, ',' order by id), (select search_pattern from prm_ext), 'g')), ';', 1)::int  as id
         
    from      seq cross join prm_ext p
    
    where     seq_id is not null
         and  distance_from_first_student < p.tokens_num
         
    group by  seq_id
    

    This is the result for an extended data sample:

    id
    2
    16
    22

    Fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search