skip to Main Content

I use PostgreSQL 14 to work on a student_books table which manages books borrowed by students. Each student can have zero or more books borrowed at any point in time. The table look like this (the order of rows doesn’t matter).

student_id book_id
1 113
2 37
5 94

Furthermore, I have an (append-only) library_ledger table with recent transactions at the library. It keeps track of whether a student borrowed or returned a book, and when. The order of rows matters, it’s sorted in ascending order on the first date column:

date action student_id book_id
2022-11-20 09:14:09 borrow 2 3
2022-11-21 17:43:22 return 1 113
2022-11-22 14:03:04 borrow 5 204
2022-11-22 14:03:08 return 5 94
2022-11-22 14:03:15 return 5 204

Given the student_books and library_ledger tables, I’d like to compute the new set of books borrowed by each student. In the above case, I’d like to get the result set

student_id book_id
2 3
2 37

It’s not difficult to write e.g. a Java program which processes each row in the library_ledger and updates the student_books table by issuing INSERT/DELETE queries. However, I wonder if this can be done in SQL directly.

Maybe if the initial table is grouped by student_id and aggregating the book IDs using array_agg, one could use that as the starting value for an aggregate function which processes the actions in library_ledger by transforming the array using either array_append or array_remove (depending on the value in the action column). At the end, the result could be un-nested.

Is there maybe a simpler way to achieve this, possibly even without using a custom aggregate function?

2

Answers


  1. I’m not sure why you’re considering an aggregate function here. I think the basic idea is something like:

    Remove the records where the most recent library_ledger action for that student/book is a "return"…

    DELETE
    FROM student_books
    WHERE (student_id, book_id) IN
    (
     SELECT student_id, book_id
     FROM
     (
      SELECT student_id, book_id, action, row_number() OVER (PARTITION BY student_id, book_id ORDER BY date DESC) rn
      FROM library_ledger
     ) sq
     WHERE rn = 1 AND action='return' -- Most recent action is "return"
    )
    

    … then add in any student/book pairs that don’t already exist in student_books and have a most recent action of "borrow"

    INSERT INTO student_books
    SELECT student_id, book_id
    FROM 
    (
     SELECT student_id, book_id, action, row_number() OVER (PARTITION BY student_id, book_id ORDER BY date DESC) rn
     FROM library_ledger
    ) sq
    WHERE rn = 1 AND action='borrow' -- Most recent action is "borrow"
    AND (student_id, book_id) NOT IN -- Doesn't already exist in table
     (
      SELECT student_id, book_id
      FROM student_books
     )
    

    I don’t really see any reason to consider intermediate actions, since only the final one (for each student/book pair) should matter.

    You can see my attempt in this fiddle.

    Login or Signup to reply.
  2. From the perspective of journal and ledger, there could be a need to understand between borrows and returns for auditing or accounting purpose. For example,

    select student_id, 
           book_id,
           action,
           date,
           count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
           count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
      from library_ledger;
    

    Gives us an idea of:

    student_id|book_id|action|date                   |borrow_count|return_count|
    ----------+-------+------+-----------------------+------------+------------+
             1|    113|return|2022-11-21 17:43:22.000|           0|           1|
             2|      3|borrow|2022-11-20 09:14:09.000|           1|           0|
             5|     94|return|2022-11-22 14:03:08.000|           0|           1|
             5|    204|borrow|2022-11-22 14:03:04.000|           1|           0|
             5|    204|return|2022-11-22 14:03:15.000|           1|           1|
    

    Based on that, the SQL action is determined:

    with cte as (
    select student_id, 
           book_id,
           action,
           date,
           count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
           count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
      from library_ledger)
    select student_id,
           book_id,
           max(borrow_count) as borrow_count,
           max(return_count) as return_count,
           case 
              when max(borrow_count) > max(return_count) then 'insert'
              when max(borrow_count) < max(return_count) then 'delete'
              else 'noop'
           end as sql_action
      from cte
     group by 1,2;
    

    We get:

    student_id|book_id|borrow_count|return_count|sql_action|
    ----------+-------+------------+------------+----------+
             1|    113|           0|           1|delete    |
             2|      3|           1|           0|insert    |
             5|     94|           0|           1|delete    |
             5|    204|           1|           1|noop      |
    

    For practices like accounting, there could be a review or an approval before apply those actions. After that, update student_books (kind of like account balance).

    -- 1. delete
    with cte as (
    select student_id, 
           book_id,
           action,
           date,
           count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
           count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
      from library_ledger),
    cte_delete as (
    select student_id,
           book_id
      from cte
     group by 1,2
     having max(borrow_count) < max(return_count))
    delete from student_books sb
     using cte_delete d
     where sb.student_id = d.student_id
       and sb.book_id = d.book_id;
    
    select * from student_books;  
      
    -- 2. insert
    with cte as (
    select student_id, 
           book_id,
           action,
           date,
           count(action) filter (where action='borrow') over (partition by student_id, book_id order by date) as borrow_count,
           count(action) filter (where action='return') over (partition by student_id, book_id order by date) as return_count
      from library_ledger),
    cte_insert as (
    select student_id,
           book_id
      from cte
     group by 1,2
     having max(borrow_count) > max(return_count))
     insert into student_books
     select student_id, book_id
       from cte_insert;
    
    student_id|book_id|
    ----------+-------+
             2|     37|
             2|      3|
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search