skip to Main Content

I have two tables involved in this situation:

tools_events

COLUMN_NAME DATA_TYPE
event_id int
event_type varchar
event_value varchar
event_client int
event_timestamp datetime

tools_clients

COLUMN_NAME DATA_TYPE
client_id int
client_name varchar
client_developer int

This is using MySQL 5.6.

The events table obviously stores events that are occurring throughout a day, and clients contains a list of clients.

There is a particular event I am looking for the lack of. For this particular event the event_type would be taskUpdated and event_code would be needs approval (dev).

Currently I have a statement put together that will pull all clients that have gotten an event logged after the specified date which seems to be working well, albeit extremely slow:

SELECT 
    t1.event_id, t1.event_user, t1.event_client, t1.event_timestamp
FROM 
    tools_events AS t1 
WHERE 
    t1.event_id = (SELECT t2.event_id 
                   FROM tools_events AS t2 
                   WHERE t2.event_type = 'taskUpdated' 
                     AND t2.event_value = 'needs approval (dev)' 
                     AND t2.event_client = t1.event_client 
                     AND t2.event_timestamp >= '2022-04-04 00:00:00' 
                   ORDER BY t2.event_timestamp DESC LIMIT 1)

What I’m looking for is a way to client_id, client_name, and client_developer from tools_clients for the clients who have not had this event logged past the specified date. Short of pulling this list into PHP and cross referencing a full list of the clients to see which ones aren’t there, I am unsure of how to tackle this in a SQL query.

On the other end, it seems that just getting a list of events that match the initial criteria seems to run pretty much instantly so perhaps it would be more efficient to just pull those and do all of the logic in PHP? My understanding as I am learning more in depth SQL is that oftentimes SQL is more efficient if you can optimize the query correctly which I don’t think I am.

Here is some sample data for the tools_clients table:

client_id client_name client_developer
1 foo 12
2 fooBar 15
3 Bar 21
4 raBoof 37

And here is some sample data for the tools_events table:

event_id event_type event_value event_client event_timestamp
23 taskUpdated needs approval (dev) 1 2022-04-02 13:08:22
25 taskUpdated needs approval (dev) 2 2022-04-02 14:22:07
28 taskUpdated needs approval (dev) 3 2022-04-02 15:09:13
29 taskUpdated needs approval (dev) 4 2022-04-02 15:36:17
32 taskUpdated needs approval (dev) 3 2022-04-05 16:42:35
38 taskUpdated needs approval (dev) 4 2022-04-05 19:01:25

Given the constraints:

WHERE event_type = 'taskUpdated' 
  AND event_value = 'needs approval (dev)' 
  AND event_timestamp >= '2022-04-04 00:00:00' 

I would be looking for the following result because these are the only two that did not have a matching event occurring after the specified date:

client_id client_name client_developer
1 foo 12
2 fooBar 15

Any assistance in this even if it’s a point in the right direction is greatly appreciated.


EDIT:

With the simplified version of my original query, provided by Paul Maxwell in his answer, as a subquery and a bit more Googling I think I was able to get something working. My understanding is that subqueries are less efficient than JOINs so I’m not sure if this is a solid solution or not but it seems to be running well:

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
WHERE client_status > 0
  AND NOT EXISTS (SELECT event_client
                  FROM tools_events AS events
                  WHERE event_type = 'taskUpdated'
                    AND event_value = 'needs approval (dev)'
                    AND event_timestamp >= '2022-04-04 00:00:00'
                    AND clients.client_id = events.event_client
                  GROUP BY event_client)

Side note: I know client_status is not mentioned in the original question. At the moment I wasn’t worried about that part as I knew I could easily add in that condition once I got something that was otherwise working.

2

Answers


  1. A simple way to negate a set of where clause predicates is to use NOT(...)

    SELECT
          t1.event_id
        , t1.event_user
        , t1.event_client
        , t1.event_timestamp
    FROM tools_events AS t1
    WHERE NOT (
            t1.event_type = 'taskUpdated'
            AND t1.event_value = 'needs approval (dev)'
            )
        AND t1.event_timestamp >= '2022-04-04 00:00:00'
    ORDER BY t1.event_timestamp DESC LIMIT 1
    

    and your existing query could be simplified too:

    SELECT
          t1.event_id
        , t1.event_user
        , t1.event_client
        , t1.event_timestamp
    FROM tools_events AS t1
    WHERE t1.event_type = 'taskUpdated'
        AND t1.event_value = 'needs approval (dev)'
        AND t1.event_timestamp >= '2022-04-04 00:00:00'
    ORDER BY t1.event_timestamp DESC LIMIT 1
    

    You do not specify which database or version, but if your db/version support window functions such as row_number() then you might be able to get both records in one query like this:

    SELECT
          event_id
        , event_user
        , event_client
        , event_timestamp
    FROM (
        SELECT 
              event_id
            , event_user
            , event_client
            , event_timestamp
            , row_number() OVER (
                PARTITION BY CASE WHEN event_type = 'taskUpdated'
                        AND event_value = 'needs approval (dev)' THEN 1 ELSE 0 END
                ORDER BY event_timestamp DESC
                ) AS rn
        FROM tools_events
        WHERE event_timestamp >= '2022-04-04 00:00:00'
        ) AS d
    WHERE rn = 1
    

    this divdes the data into 2 "partitions" and for each of those the row with the latest event timestamp will get a row number of 1. So both wanted rows are returned in one query.

    Login or Signup to reply.
  2. The sample data provided now does help (a lot) but it does not contain some columns that are referenced nor does it contain data that will present a result for the query you are seeking. So, I have added some columns and data and amended some data as well to suit.

    INSERT INTO tools_clients(client_id,client_name,client_developer)
    VALUES (1,'foo',12),
           (2,'fooBar',15),
           (3,'Bar',21),
           (4,'raBoof',37),
           (5,'2answer-or-not',67); --<< added
    
    INSERT INTO tools_events(event_id,event_type,event_value,event_client,event_timestamp)
    VALUES
     (23,'taskUpdated','needs approval (dev)',1,'2022-04-04 13:08:22'), -- date change
     (25,'taskUpdated','needs approval (dev)',2,'2022-04-04 14:22:07'), -- date change
     (28,'taskUpdated','needs approval (dev)',3,'2022-04-04 15:09:13'), -- date change
     (29,'taskUpdated','needs approval (dev)',4,'2022-04-04 15:36:17'), -- date change
     (32,'taskUpdated','needs approval (dev)',3,'2022-04-05 16:42:35'),
     (38,'taskUpdated','needs approval (dev)',4,'2022-04-05 19:01:25');
    
    INSERT INTO tools_events(event_id,event_type,event_value,event_client,event_timestamp) 
    VALUES (138,'not-taskUpdated','not-needs approval (dev)',4,'2022-04-05 19:01:25'); -- added
    

    So we can run a query (based on original subquery in question):

    SELECT
          t1.event_id
        , t1.event_user
        , t1.event_client
        , t1.event_timestamp
    FROM tools_events AS t1
    WHERE  (
            t1.event_type = 'taskUpdated'
            AND t1.event_value = 'needs approval (dev)'
            )
        AND t1.event_timestamp >= '2022-04-04 00:00:00'
    
    +----------+------------+--------------+-----------------+
    | event_id | event_user | event_client | event_timestamp |
    +----------+------------+--------------+-----------------+
    |       23 |            |            1 | 2022-04-04      |
    |       25 |            |            2 | 2022-04-04      |
    |       28 |            |            3 | 2022-04-04      |
    |       29 |            |            4 | 2022-04-04      |
    |       32 |            |            3 | 2022-04-05      |
    |       38 |            |            4 | 2022-04-05      |
    +----------+------------+--------------+-----------------+
    

    and get the negative of that query by using NOT (...):

    SELECT
          t1.event_id
        , t1.event_user
        , t1.event_client
        , t1.event_timestamp
    FROM tools_events AS t1
    WHERE  NOT (
            t1.event_type = 'taskUpdated'
            AND t1.event_value = 'needs approval (dev)'
            )
        AND t1.event_timestamp >= '2022-04-04 00:00:00'
    
    +----------+------------+--------------+-----------------+
    | event_id | event_user | event_client | event_timestamp |
    +----------+------------+--------------+-----------------+
    |      138 |            |            4 | 2022-04-05      |
    +----------+------------+--------------+-----------------+
    

    Note how event_client 4 exists in both those results, but there is no reference to event_client 5, this is important later

    If you are looking for clients that have had an event that is t1.event_type = 'taskUpdated' AND t1.event_value = 'needs approval (dev)' PLUS an event that is NOT(t1.event_type = 'taskUpdated' AND t1.event_value = 'needs approval (dev)' then this will produce that result:

    SELECT client_id, client_name, client_developer
    FROM `tools_clients` AS clients
    WHERE client_id IN (
        SELECT
              t1.event_client
        FROM tools_events AS t1
        WHERE  NOT (
                t1.event_type = 'taskUpdated'
                AND t1.event_value = 'needs approval (dev)'
                )
        AND t1.event_timestamp >= '2022-04-04 00:00:00')
    
    
    +-----------+-------------+------------------+
    | client_id | client_name | client_developer |
    +-----------+-------------+------------------+
    |         4 | raBoof      |               37 |
    +-----------+-------------+------------------+
    

    However if you only want clients that have absolutely no events that meet these criteria t1.event_type = 'taskUpdated' AND t1.event_value = 'needs approval (dev)' then there are 2 options. Using a left join, here we join the rows to DO match the criteria, but then look for those records where the join produces NULL (this is therefore the "negative" of the criteria):

    SELECT client_id, client_name, client_developer
    FROM `tools_clients` AS clients
    left join tools_events AS events ON clients.client_id = events.event_client
            AND events.event_type = 'taskUpdated'
            AND events.event_value = 'needs approval (dev)'
            AND events.event_timestamp >= '2022-04-04 00:00:00'
    WHERE client_status > 0
      AND events.event_client IS NULL
    

    or, a NOT EXISTS (also known as a left semi-join);

    SELECT client_id, client_name, client_developer
    FROM `tools_clients` AS clients
    WHERE client_status > 0
      AND NOT EXISTS
        (SELECT NULL
          FROM tools_events AS events
          WHERE event_type = 'taskUpdated'
            AND event_value = 'needs approval (dev)'
            AND event_timestamp >= '2022-04-04 00:00:00'
            AND clients.client_id = events.event_client
          )
    

    both of those queries result is:

    +-----------+----------------+------------------+
    | client_id |  client_name   | client_developer |
    +-----------+----------------+------------------+
    |         5 | 2answer-or-not |               67 |
    +-----------+----------------+------------------+
    

    Note when using EXISTS or NOT EXISTS the select clause of the subquery does not have to "return" anything so you can use select null or select 1 or select *. This is because EXISTS is just testing if the criteria of the from/where clause exist or not. Also, because that select clause does not really "return" anything, it is NOT necessary to use GROUP BY (or select distinct) in the NOT EXISTS subquery. It is also less efficient to include that GROUP BY in this form of subquery as it just wasted effort.

    nb: This is very different to using IN() where the data IS "returned" by the select clause.

    So hopefully you can see that there are various ways to negate your conditions – but how you negate them can produce different overall results. To see all of these queries running use: db<>fiddle here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search