skip to Main Content

I have a table that create the relationship between users and groups that looks like:

UserGroups

group_id user_id
1 1
2 1
1 2
1 3

and the table users looks like:

id name
1 John
2 Mary
3 Bob
4 Carol

And I’m successfully selecting my results like this:

SELECT
...,
CASE
    WHEN users.id in (SELECT user_id from usergoups where group_id = 1) and not in (SELECT user_id from usergoups where group_id = 2) then 1
    WHEN users.id in (SELECT user_id from usergoups where group_id = 2) and not in (SELECT user_id from usergoups where group_id = 1) then 2
    WHEN users.id in (SELECT user_id from usergoups where group_id = 1) and in (SELECT user_id from usergoups where group_id = 2) then 1
    else class.group
end as class_group
...
LEFT JOIN users on users.id = 1

for the following sample output:

class_id class_group
1 1
2 1
3 1
4 1

I was wondering if it is possible to simplify the select and make it more efficient reducing the number of selections, as i could just have something like:

...
LEFT JOIN usergroups on usergroups.user_id = users.id
...

The only line that i could think of was:

...
WHEN 2 in usergroups then...

but it just trows a sintax error as i kinda expected

2

Answers


  1. We could avoid correlated subqueries in expressions in the SELECT list, if we instead use an inline view (mysql calls it a derived table) to check for the existence of rows in user_groups. Following this line in the outer query

        LEFT JOIN users ON users.id = 1
    

    I’d add an outer join to an inline view query something like this:

        LEFT JOIN ( SELECT ug.user_id
                         , MAX(CASE ug.group_id WHEN 1 THEN ug.group_id ELSE NULL END) AS g1
                         , MAX(CASE ug.group_id WHEN 2 THEN ug.group_id ELSE NULL END) AS g2
                      FROM user_groups ug
                     WHERE ug.group_id IN (1,2)
                     GROUP
                        BY ug.user_id
                  ) g
               ON g.user_id = users.id
    

    The view query will return one (or zero) rows for each user_id in the user_groups table, so it flattens to

             user_id    g1    g2
             -------  ----  ----
                   1     1     2
                   2  NULL     2
                   3     1  NULL
    

    For improved performance, if the outer query is restricting results to just one user, e.g.

        ON users.id = 1
    

    We can include a matching restriction in the inline view query,

                     WHERE ug.group_id IN (1,2)
                       AND ug.user_id = 1         /* <-- add this to match the condition in the outer query */
    

    In the SELECT list of the outer query, we can do the conditional checks,

              CASE
              WHEN g.g1 AND g.g2 IS NULL THEN 1
              WHEN g.g2 AND g.g1 IS NULL THEN 2
              WHEN g.g1                  THEN 1
              ELSE class.group
              END AS foo
    

    Note that the first WHEN is redundant. If the first WHEN condition is TRUE, then we know the second WHEN condition cannot be TRUE, so we could check that first, and then the third WHEN condition covers the first. This would return an equivalent result,

              CASE
              WHEN g.g2 AND g.g1 IS NULL THEN 2
              WHEN g.g1                  THEN 1
              ELSE class.group
              END
    

    If we weren’t needing to return class.group from the CASE expression, I’d be more tempted to move the logic down into the inline view. (We could move the checks for group_id 2 and group_id 1, and return 2 or 1 or NULL to the outer query, but we’d still need an expression to check the return from g to see if we return that, or return class.group

    Login or Signup to reply.
  2. Another way to avoid correlated subqueries in the SELECT list, the syntax is a bit cleaner than the approach in my other answer,

    BUT we would only do this…

    IF (and only IF) we have some guarantee that rows in user_groups are UNIQUE i.e. if theres a PRIMARY KEY or UNIQUE KEY constraint on the (group_id,user_id) tuple (or it could be the other way around (user_id,group_id))

    THEN we could use two outer joins to the user_groups table

    So following this line:

        LEFT JOIN users ON users.id = 1
    

    We could add

        LEFT JOIN user_groups g1 ON g1.group_id = 1 AND g1.user_id = users.id
        LEFT JOIN user_groups g2 ON g2.group_id = 2 AND g2.user_id = users.id
    

    Then in our expression in the SELECT list could be

        CASE
        WHEN g2.group_id AND g1.group_id IS NULL THEN 2
        WHEN g1.group_id THEN 1
        ELSE class.group
        END AS class_group
    

    NOTE: in MySQL, when a numeric is evaluated as a boolean, a value of 0 (zero) is FALSE, any non-zero value is TRUE, and NULL is NULL.

    So the above is equivalent shorthand for

        CASE
        WHEN g2.group_id <> 0 AND g1.group_id IS NULL THEN 2
        WHEN g1.group_id <> 0 THEN 1
        ELSE class.group
        END AS class_group
    

    N.B. without guaranteed uniqueness, the JOINs have potential to match multiple rows from user_groups (e.g. consider if there are two rows (1,1),(1,1) ), that would double the number of rows returned, which we probably do not want)

    The approach in my other answer is not subject to the same uniqueness requirement; any duplicate rows for a (user_id,group_id) will be collapsed into a single row for the user_id by the GROUP BY in the inline view.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search