skip to Main Content

I want to export some data from the DB.

Basically what I want to say is this:

1- Select mbr_name from the members table

2- Choose the ones that exist at the course_registration table (based on mbr_id)

3- Join the course_registration ids with course_comments table

Then I need to apply these WHERE condtions as well:

1- Make sure that crr_status at course_registration table is set to completed

2- Make sure that crr_ts at course_registration table is between "2021-03-07 00:00:00" AND "2022-03-17 00:00:00"

3- Make sure that crm_confirmation from course_comments table is set to accept

So I tried my best and wrote this:

SELECT members.mbr_name
FROM members
INNER JOIN course_registration AS udt ON members.mbr_id = udt.crr_mbr_id 
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
WHERE udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
AND dot.crm_confirmation = "accept";

But this will give wrong data somehow.

The actual number of members that have all these conditions are 12K but this query gives me 120K results which is obviously wrong!

So what’s going wrong here? How can I solve this issue?


UPDATE:

Here are the keys of each table:

members (mbr_id (PK), mbr_name) 
course_registration (crr_id (PK), crr_mbr_id (FK), crr_cor_id (FK), crr_status)
course_comments (crm_id (PK), crm_reference_id (FK), crm_confirmation)

6

Answers


  1. You have a so-called cardinality problem. JOINs can, when multiple rows on the one table match a single row in the other table, cause the result set to have multiple rows. Your JOIN as written will generate many rows: members x courses x comments. That’s what JOIN does.

    It looks like you want exactly one row in your resultset for each member who …

    • has completed one or more courses meeting your criterion.
    • has submitted one or more comments.

    So let’s start with a subquery. It gives the mbr_id values for members who have submitted one or more comments on one or more courses that meet your criteria.

            SELECT udt.crr_mbr_id
              FROM course_registration udt
              JOIN course_comments dot ON  udt.crr_cor_id = dot.crm_reference_id
             WHERE udt.crr_status = "completed"
               AND udt.crr_ts >= "2021-03-07 00:00:00"
               AND udt.crr_ts < "2022-03-17 00:00:00"
               AND dot.crm_confirmation = "accept"
             GROUP BY udt.mbr_id
    

    You use the results of that subquery to find your members. The final query is

    SELECT members.mbr_name
      FROM members
     WHERE members.mbr_id IN (
            SELECT udt.crr_mbr_id
              FROM course_registration udt
              JOIN course_comments dot ON  udt.crr_cor_id = dot.crm_reference_id
             WHERE udt.crr_status = "completed"
               AND udt.crr_ts >= "2021-03-07 00:00:00"
               AND udt.crr_ts < "2022-03-17 00:00:00"
               AND dot.crm_confirmation = "accept"
             GROUP BY udt.mbr_id )
    
    Login or Signup to reply.
  2. Try using this and if it not works then try using ‘between’ for date field (crr_ts).

    select mbr.mbr_name from
    (
    select * from course_registration AS udt
    INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
    where dot.crm_confirmation = "accept" AND udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
    )x
    INNER JOIN  members mbr on mbr.mbr_id = x.crr_mbr_id
    
    Login or Signup to reply.
  3. My first guess, without knowing the context, is that:

    • a member can register to one or more courses,
    • each course can have one or more comments.

    If this is the case, you are getting way more tuples due to redundancy. In that case you just need to stick a DISTINCT right after your first SELECT.

    Furthermore, since the JOIN is the most resource-expensive operation in sql, I would first filter the data and then leave any join as the last operation to improve efficiency. Something like this:

    SELECT 
        members.mbr_name 
    FROM
        (
        SELECT DISTINCT
            crm_reference_id
        FROM 
            course_comments
        WHERE 
            crm_confirmation = 'accept'
        ) accepted_comments
    INNER JOIN 
        (
        SELECT DISTINCT
            crr_mbr_id,
            crr_cor_id
        FROM 
            course_registration
        WHERE
            crr_status = 'completed'
        AND
            crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
        ) completed_courses 
    ON 
        accepted_comments.crm_reference_id = completed_courses.crr_cor_id
    INNER JOIN 
        members 
    ON 
        members.mbr_id = completed_courses.crr_mbr_id
    
    Login or Signup to reply.
  4. I would start at the registration FIRST instead of the members. By getting a DISTINCT list of members signing up for a course, you have a smaller subset. From that too, joining to the comments for just those accepted gives you a final list.

    Once you have those two, join back to members to get the name. I included the member ID as well as the name because what if you have two or more "John" or "Karen" names in the registration. At least you have the ID that confirms the unique students.

    select
            m.mbr_name,
            m.mbr_id
        from
            ( select distinct
                    cr.crr_mbr_id
                from
                    course_registration cr
                        JOIN course_comments cc 
                            ON cr.crr_cor_id = cc.crm_reference_id
                            AND cc.crm_confirmation = 'accept'
                WHERE 
                        cr.crr_status = 'completed'
                    AND cr.crr_ts >= '2021-03-07' 
                    AND cr.crr_ts < '2022-03-17' ) PQ
            JOIN members m
                ON PQ.crr_mbr_id = m.mbr_id 
    
    Login or Signup to reply.
  5. As you only want to select Member name you can try as below if this gives required result

    select m.mbr_name
      from Members m
      where Exists ( select 1 from Course_Registration cr 
                                   join Course_Comments cm on cr.crr_cor_id = cm.crm_reference_id
                      where cr.crr_mbr_id = m.mbr_id
                        And cr.crr_status = "completed" AND cr.crr_ts >= "2021-03-07 00:00:00" AND cr.crr_ts < "2022-03-17 00:00:00"
                        AND cr.crm_confirmation = "accept";
                     );
    
    
    
    Login or Signup to reply.
  6. Try this:

    SELECT *
    FROM members M
    INNER JOIN course_registration CR
    ON CR.crr_mbr_id = M.mbr_id
    AND CR.crr_status = 'completed'
    AND CR.crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
    WHERE EXISTS(
        SELECT * FROM course_comments CC
        WHERE CC.crm_confirmation = 'accept'
        AND CC.crm_reference_id = CR.crr_cor_id
    )
    ORDER BY M.mbr_id;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search