skip to Main Content

I am trying to get the category of specific students health inspection for each month. However, sometimes there are multiple entries for a month. How do I get the latest entry for the month using SQL?

Below is the SQL I am using:

SELECT  mt_tagged_student.student_id, 
        mt_inspection.category,
        mt_inspection.created_date as entry_date
FROM public.mt_tagged_student
        LEFT JOIN public.mt_student  On mt_tagged_student.student_id = mt_student.id
        LEFT JOIN public.mt_inspection  On mt_inspection.student_id = mt_student.id
where mt_inspection.created_date between '2023-10-1 00:00:01' AND '2023-10-31 23:59:59'
    and mt_inspection.category = 1
ORDER BY mt_tagged_student.student_id ASC, entry_date Desc

Here’s some of the result of the SQL,

enter image description here

The highlighted one show that sometimes there are two entries in a month. How do I get the latest entry of that month only?

3

Answers


  1. You can use NOT EXISTS as follows:

    SELECT  mt_tagged_student.student_id, 
            mi.category,
            mi.created_date as entry_date
    FROM public.mt_tagged_student
            LEFT JOIN public.mt_student  On mt_tagged_student.student_id = mt_student.id
            LEFT JOIN public.mt_inspection mi On mi.student_id = mt_student.id
    where mi.created_date between '2023-10-1 00:00:01' AND '2023-10-31 23:59:59'
        and mi.category = 1
        and not exists (select 1 from public.mt_inspection mi 
                       where mi.student_id = mii.student_id
                       and date_trunc('month', mi.created_date) = date_trunc('month', mii.created_date)
                       and mii.created_date > mi.created_date
                       and mii.category = 1)
    ORDER BY mt_tagged_student.student_id ASC, entry_date Desc  
    

    OR you can use ROW_NUMBER as follows:

    select t.student_id, t.category, t.entry_date from
    (SELECT      mt_tagged_student.student_id, 
                mt_inspection.category,
                mt_inspection.created_date as entry_date,
                row_number() over (partition by mt_tagged_student.student_id, date_trunc('month', mt_inspection.created_date) order by mt_inspection.created_date desc) as rn
        FROM public.mt_tagged_student
                LEFT JOIN public.mt_student  On mt_tagged_student.student_id = mt_student.id
                LEFT JOIN public.mt_inspection  On mt_inspection.student_id = mt_student.id
        where mt_inspection.created_date between '2023-10-1 00:00:01' AND '2023-10-31 23:59:59'
            and mt_inspection.category = 1) t
        ORDER BY t.student_id ASC, t.entry_date Desc;
    
    Login or Signup to reply.
  2. A textbook example for DISTINCT ON:

    SELECT DISTINCT ON (student_id)
           student_id
         , i.category
         , i.created_date AS entry_date
    FROM   public.mt_tagged_student t
    JOIN   public.mt_student    s USING (student_id)
    JOIN   public.mt_inspection i USING (student_id)
    WHERE  i.created_date >= '2023-10-1'
    AND    i.created_date <  '2023-11-1'
    AND    i.category = 1
    ORDER  BY student_id, i.created_date DESC NULLS LAST;
    

    Should also perform best for just a couple entries max per month like you indicate. See:

    Also, don’t use BETWEEN with timestamps.

    I converted both instances of LEFT [OUTER] JOIN to plain [INNER] JOIN because the former make no sense in combination with WHERE conditions on the right table. See:

    I threw in DESC NULLS LAST at first, because the LEFT JOIN can introduce null values. That’s removed now, but we still don’t know whether created_date itself is defined NOT NULL. See:

    USING instead of ON is a totally optional syntax shortcut. But I warmly recommend using table aliases as demonstrated to cut down on the noise.

    Login or Signup to reply.
  3. WITH ranked_inspection AS (
        SELECT
            mt_tagged_student.student_id,
            mt_inspection.category,
            mt_inspection.created_date AS entry_date,
            ROW_NUMBER() OVER (PARTITION BY mt_tagged_student.student_id
                               ORDER BY mt_inspection.created_date DESC) AS rnk
        FROM
            public.mt_tagged_student
            LEFT JOIN public.mt_student ON mt_tagged_student.student_id = mt_student.id
            LEFT JOIN public.mt_inspection ON mt_inspection.student_id = mt_student.id
        WHERE
            mt_inspection.created_date >= '2023-10-1'
            AND mt_inspection.created_date <  '2023-11-1'
            AND mt_inspection.category = 1
    )
    SELECT
        student_id,
        category,
        entry_date
    FROM
        ranked_inspection
    WHERE
        rnk = 1
    ORDER BY
        student_id ASC, entry_date DESC;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search