skip to Main Content

I have table with over 35 million email records.

I need to get all of the records between a date range and also exist at least 3 times in the table.

1 email may have more than one property but will never have the same property twice.

The table looks like this:

EMAIL PROPERTY LAST_SENT_DATE
email1 prop1 2024-06-25
email2 prop1 2024-06-16
email3 prop2 2024-06-02
email4 prop3 2024-06-25
email1 prop2 2024-06-26
email1 prop3 2024-06-10

So I want to write a query that will return the emails in June but exist more than 3 times.

SELECT * FROM `myTable` 
WHERE 
(
  (`last_sent_date` BETWEEN '2024-06-01' AND '2024-06-30') 
   AND 
  (//...SOME PROPERTY COUNT HERE)
) 

The results should look like this:

EMAIL PROPERTY LAST_SENT_DATE
email1 prop1 2024-06-25
email1 prop2 2024-06-26
email1 prop3 2024-06-10

I’m guessing it needs to do some kind of COUNT on PROPERTY, but I’m not sure where or how to add it.

2

Answers


  1. WITH
    cte AS (
      SELECT email, property, last_sent_date,
             COUNT(*) OVER (PARTITION BY email) cnt
      FROM test
      WHERE last_sent_date BETWEEN '2024-06-01' AND '2024-06-30'
    )
    SELECT email, property, last_sent_date
    FROM cte
    WHERE cnt >= 3;
    
    email property last_sent_date
    email1 prop1 2024-06-25
    email1 prop2 2024-06-26
    email1 prop3 2024-06-10

    fiddle

    Login or Signup to reply.
  2. Well, it’s pretty much straightforward, you can use an EXIST clause. this is the more efficient way because it allows the database to stop searching as soon it finds a matching row:

    Here is the code:

    SELECT mt.*
    FROM `myTable` mt
    WHERE mt.`last_sent_date` BETWEEN '2024-06-01' AND '2024-06-30'
    AND EXISTS (
      SELECT 1
      FROM `myTable` mt2
      WHERE mt2.`email` = mt.`email`
      GROUP BY mt2.`email`
      HAVING COUNT(DISTINCT mt2.`property`) > 3
    )
    

    This query uses an EXIST clause to check if there is at least one row in the subquery is matches the email column.

    Just advice: consider creating an index on the email and last_sent_date columns to improve the performance.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search