I am trying to retrieve data from public.users
tables and this the query I am using but I am not able to retrieve, what i actually require:
WITH duplicates AS (
SELECT email, COUNT(*) AS duplicate_count
FROM public.users
GROUP BY email
HAVING COUNT(*) > 1)
SELECT u.user_id, u.user_name, d.duplicate_count
FROM public.users u
JOIN duplicates d ON u.email = d.email
WHERE u.registration_date >= CURRENT_DATE - INTERVAL '30 days';
I am trying to use the concept of Common Table Expression (CTE) named "duplicates" to find all email addresses that have more than one occurrence in the "public.users" table.
I might be making some mistake, because despite having duplicate emails I am not able to fetch them using this particular query.
I will be grateful if you could help!
3
Answers
The key issues in your query:
The CTE itself needs the filter on registration_date, not the outer query, to only consider recent emails for counting duplicates.
When joining the CTE back to the users table, you again need the registration_date filter in the join condition to only get recent duplicates.
Here is an example for you to understand better:
Let’s say there is an email that appears 3 times total in the table. But only 2 of them are within the last 30 days.
If you don’t filter in the CTE, it would count 3 occurrences and include that email in the duplicate list.
But in the outer query, only 2 of those rows would match the date filter. So you’d end up with incorrect data.
By filtering in both parts, you ensure the logic is restricted to the right date range in all steps.
the issue is with the
GROUP BY
clause in the CTE.you are grouping by the
email
column, but you are not including theuser_id
oruser_name
in theSELECT
statement of the CTE.this could cause unexpected results when you later join the CTE with the
public.users
table.If you just want to get emails with entries during the last 30 days and a duplicate from any time:
Also assuming
registration_date
is an actualdate
.If the table is big, with lots of older data, and duplicates often occur in close proximity, this more elaborate query might perform better:
This time also including details for the latest entry.