skip to Main Content

I have students associated to schools, and want to find all schools that have five or fewer (including zero) students that have has_mohawk = false.

Here’s an Activerecord query:

School.joins(:students)
      .group(:id)
      .having("count(students.id) < 5")
      .where(students: {has_mohawk: true})

This works for schools with 1 – 4 such students with mohawks, but omits schools where there are no such students!

I figured out a working solution and will post it. But I am interested in a more elegant solution.

Using Rails 5. I’m also curious whether Rails 6’s missing could handle this?

2

Answers


  1. Chosen as BEST ANSWER

    In addition to the first query, write another to find schools where no students have mohawks (works in Rails 5).

    School.left_outer_joins(:students)
          .group(:id)
          .having("max(has_mohawk::Integer) = 0")
    

    You might think from this popular answer that you could instead just write:

    School.left_outer_joins(:students)
          .group(:id)
          .where.not(student: {has_mohawk: true})
    

    But that will include (at least in Rails 5) schools where there is any student with a has_mohawk value of false, even if some students have a has_mohawk value of true.

    Explanation of max(has_mohawk::Integer) = 0

    It converts the boolean to an integer (1 for true, 0 for false). Schools with any true values will have a max of 1, and can thus be filtered out.

    Similiar: SQL: Select records where ALL joined records satisfy some condition


  2. find all schools that have five or fewer (including zero) students that have has_mohawk = false.

    Here is an optimized SQL solution. SQL is what it comes down to in any case. (ORMs like Active Record are limited in their capabilities.)

    SELECT sc.*
    FROM   schools sc
    LEFT   JOIN (
       SELECT school_id
       FROM   students
       WHERE  has_mohawk = false
       GROUP  BY 1
       HAVING count(*) >= 5
       ) st ON st.school_id = sc.id
    WHERE  st.school_id IS NULL;  -- "not disqualified"
    

    While involving all rows, aggregate before joining. That’s faster.
    This query takes the reverse approach by excluding schools with 5 or more qualifying students. The rest is your result – incl. schools with 0 qualifying students. See:

    Any B-tree index on students (school_id) can support this query, but this partial multicolumn index would be perfect:

    CREATE INDEX ON students (school_id) WHERE has_mohawk = false;
    

    If there can be many students per school, this is faster:

    SELECT sc.*
    FROM   schools sc
    JOIN   LATERAL (
       SELECT count(*) < 5 AS qualifies
       FROM  (
          SELECT  -- select list can be empty (cheapest)
          FROM   students st
          WHERE  st.school_id = sc.id
          AND    st.has_mohawk = false
          LIMIT  5  -- !
          ) st1
       ) st2 ON st2.qualifies;
    

    The point is not to count all qualifying students, but stop looking once we found 5. Since the join to the LATERAL subquery with an aggregate function always returns a row (as opposed to the join in the first query), schools without qualifying students are kept in the loop, and we don’t need the reverse approach.

    About LATERAL:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search