I want to export some data from the DB.
Basically what I want to say is this:
1- Select mbr_name
from the members
table
2- Choose the ones that exist at the course_registration
table (based on mbr_id
)
3- Join the course_registration
ids with course_comments
table
Then I need to apply these WHERE condtions as well:
1- Make sure that crr_status
at course_registration
table is set to completed
2- Make sure that crr_ts
at course_registration
table is between "2021-03-07 00:00:00"
AND "2022-03-17 00:00:00"
3- Make sure that crm_confirmation
from course_comments
table is set to accept
So I tried my best and wrote this:
SELECT members.mbr_name
FROM members
INNER JOIN course_registration AS udt ON members.mbr_id = udt.crr_mbr_id
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
WHERE udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
AND dot.crm_confirmation = "accept";
But this will give wrong data somehow.
The actual number of members
that have all these conditions are 12K but this query gives me 120K results which is obviously wrong!
So what’s going wrong here? How can I solve this issue?
UPDATE:
Here are the keys of each table:
members (mbr_id (PK), mbr_name)
course_registration (crr_id (PK), crr_mbr_id (FK), crr_cor_id (FK), crr_status)
course_comments (crm_id (PK), crm_reference_id (FK), crm_confirmation)
6
Answers
You have a so-called cardinality problem. JOINs can, when multiple rows on the one table match a single row in the other table, cause the result set to have multiple rows. Your JOIN as written will generate many rows: members x courses x comments. That’s what JOIN does.
It looks like you want exactly one row in your resultset for each member who …
So let’s start with a subquery. It gives the
mbr_id
values for members who have submitted one or more comments on one or more courses that meet your criteria.You use the results of that subquery to find your members. The final query is
Try using this and if it not works then try using ‘between’ for date field (crr_ts).
My first guess, without knowing the context, is that:
If this is the case, you are getting way more tuples due to redundancy. In that case you just need to stick a
DISTINCT
right after your first SELECT.Furthermore, since the
JOIN
is the most resource-expensive operation in sql, I would first filter the data and then leave any join as the last operation to improve efficiency. Something like this:I would start at the registration FIRST instead of the members. By getting a DISTINCT list of members signing up for a course, you have a smaller subset. From that too, joining to the comments for just those accepted gives you a final list.
Once you have those two, join back to members to get the name. I included the member ID as well as the name because what if you have two or more "John" or "Karen" names in the registration. At least you have the ID that confirms the unique students.
As you only want to select Member name you can try as below if this gives required result
Try this: