skip to Main Content

The users to submit an interval of starting and ending pages that he/she read in a specific
book. Note that the user can submit multiple intervals for the same book.
I need the query announce the most recommended five books in the system, which are picked based on how many unique pages have been read for
all the users that submitted their intervals in the first operation (sorted by books with the most read pages to books with the least read pages).

The book_user table is the pivot table which I need the query on so how to get the result below for the inserted records which are:

Reading intervals:

User 1 read from page 10 to page 30 in Book 1
User 2 read from page 2 to page 25 in Book 1
User 1 read from page 40 to page 50 in Book 2
User 3 read from page 1 to page 10 in Book 2
The most read books results:
Book 1 -> 28 pages
Book 2 -> 20 pages

I try this query:

select 'book_id',books.name as book_name,SUM(end_page - start_page) AS num_of_read_pages FROM book_user JOIN books ON books.id=book_user.book_id GROUP BY book_id ORDER BY num_of_read_pages DESC;

But it’s not get the unique page for overlap intervals

And when I ask chatgpt it gives me this query which is recursive cte but it not work it just looping

WITH RECURSIVE cte AS (
    SELECT book_id, MIN(start_page) AS start_page, MAX(end_page) AS end_page
    FROM book_user
    GROUP BY book_id, start_page
    UNION ALL
    SELECT cte.book_id, cte.start_page, cte.end_page
    FROM cte
    JOIN book_user ON cte.book_id = book_user.book_id AND cte.start_page <= book_user.start_page AND cte.end_page >= book_user.end_page
)
SELECT book_id, SUM(end_page - start_page + 1) AS total_pages
FROM cte
GROUP BY book_id
ORDER BY total_pages DESC;

2

Answers


  1. See example.

    1. Group ranges(segments) like (3-10),(3-10),(3-21) -> (3-21) with same start_date. Row_number() with start_date
    2. Recursive query
      2.1. Anchor. Take ranges, where start_page not belongs any other range.
      From set (3-10),(5-12),(4-7) take (3-10)
      2.2. Join ranges
    3. Again group by start_date
    4. Count pages
    with recursive  t as( 
         -- join ranges with same start_page 
         -- and row_number() for sequence join
      select book_id,start_page,max(end_page)end_page
        ,row_number()over(partition by book_id order by start_page) rn
      from book_user
      group by book_id,start_page
    )
    ,r as(  -- recursive join 
         -- anchor - ranges with a "free" start_page
      select 0 lvl,bu.book_id,bu.start_page,bu.end_page,bu.rn
      from t bu
      where not exists(select 1 from t bu2 
             where bu2.book_id=bu.book_id and bu2.rn<bu.rn
               and bu.start_page between bu2.start_page and bu2.end_page)
      union all
      select lvl+1,r.book_id,r.start_page,t.end_page,t.rn
      from r inner join t on t.book_id=r.book_id and t.rn>r.rn
           and r.end_page between t.start_page and r.end_page
    )
    select book_id,sum(end_page-start_page+1) total_pages
    from ( -- again, we group segments with the same start_page and different end_page
          select book_id,start_page,max(end_page) end_page 
          from r 
          group by book_id,start_page
      ) gr
    group by book_id
    

    Details here demo

    Output

    book_id total_pages path
    1 38 2-2:25,26,2:25,26:31,33-33:40
    2 25 1-1:10,1:10:11,1:10:11:14,40-40:50
    3 31 1-1:10,20-20:40

    Test data

    create table books(id int,book_name varchar(20));
    insert into books values(1,'Book 1'),(2,'Book 2'),(3,'Book 3');
    create table users(id int,user_name varchar(20));
    insert into users values(10,'User 10'),(20,'User 20'),(30,'User 40'),(30,'User 40');
    create table book_user(user_id int,book_id int,start_page int,end_page int);
    insert into book_user values
     (10,1, 10,30)
    ,(20,1,  2,25)
    ,(30,1,  2,26)
    ,(30,1, 10,31)
    ,(40,1, 33,40)
    ,(10,2, 40,50)
    ,(30,2,  1,10)
    ,(40,2, 10,11)
    ,(20,2, 11,14)
    ,(10,3,  1,10)
    ,(20,3, 20,40)
    ;
    
    Login or Signup to reply.
  2. See query for task type gaps and islands

    with cte as(
     select *
        ,sum(new_range)over(partition by book_id order by start_page) rng_num
     from(
       select *
         ,case when start_page>max(end_page)over(partition by book_id order by start_page
                    rows between unbounded preceding and 1 preceding) 
             then 1 
          else 0 
          end new_range
       from book_user
      )a
    )
    select book_id,sum(end_page-start_page+1) total_pages
    from(
      select book_id,min(start_page)start_page,max(end_page)end_page, rng_num
      from cte
      group by book_id,rng_num
    )b
    group by book_id
    

    Demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search