skip to Main Content

I am at the beginning of learning SQL, and this problem gives me a headache. I have a table where I want to work with these columns:

updated_date upload_date
2023-04-03 09:10:50 2023-04-03 09:08:24
2023-04-03 09:14:55 2023-04-03 09:08:34
2023-04-03 09:23:51 2023-04-03 09:08:36
2023-04-03 09:25:53 2023-04-03 09:14:35
2023-04-03 09:28:43 2023-04-03 09:17:29
2023-04-03 10:29:06 2023-04-03 09:39:16
2023-04-03 10:53:15 2023-04-03 09:44:16
2023-04-03 10:24:40 2023-04-03 09:48:39

What I would need is the COUNT of records within 1 hour timeframe (let’s say from 2023-04-03 09:00:00 to 2023-04-03 10:00:00) from the "upload_date" column, but only those where the processing time (updated_date – upload_date) takes more than 1 hour, plus return the average processing time (in hh:mm:ss format) in that group. This should be one row, the next row should contain the COUNTs between 10:00-11:00 and so on.

I tried this query:

SELECT COUNT(*) AS number_of_feeds,
       AVG(TIMEDIFF(updated_date, upload_date)) AS average_processing_time
FROM   auk_feed_queue  
WHERE  TIMEDIFF(updated_date, upload_date) > '00:59:59'

But it returns all records from the table in one row, without the correct time format:

number_of_feeds average_processing_time
1264 90274.3861

I would need multiple rows, each row contains 1 hour timeframe. How do I achieve that?

2

Answers


  1. You can find all hourly intervals in subquery, and then join the original table back onto it:

    select t.d1 hour_interval, 
         sec_to_time(avg(time_to_sec(timediff(u.updated_date, u.upload_date)))) avg_duration, 
         count(*) cnt 
    from (select distinct date_format(u.upload_date, '%Y-%m-%d %H:00:00.000') d1 from uploads u) t 
    join uploads u on t.d1 <= u.upload_date and u.upload_date <= t.d1 + interval 1 day
    where time_to_sec(timediff(u.updated_date, u.upload_date)) / 3600 > 1
    group by t.d1
    

    See fiddle

    Login or Signup to reply.
  2. Your sample data only has one result that has a duration greater than 1 hour. I added a second one.

    CREATE TABLE data (updated_date DATETIME, upload_date DATETIME);
    INSERT INTO data VALUES
    ('2023-04-03 09:10:50', '2023-04-03 09:08:24'),
    ('2023-04-03 09:14:55', '2023-04-03 09:08:34'),
    ('2023-04-03 09:23:51', '2023-04-03 09:08:36'),
    ('2023-04-03 09:25:53', '2023-04-03 09:14:35'),
    ('2023-04-03 09:28:43', '2023-04-03 09:17:29'),
    ('2023-04-03 10:29:06', '2023-04-03 09:39:16'),
    ('2023-04-03 10:53:15', '2023-04-03 09:44:16'),
    ('2023-04-03 11:53:15', '2023-04-03 10:27:16'), /*Added this row to create second result*/
    ('2023-04-03 10:24:40', '2023-04-03 09:48:39');
    

    The query first gets the beginning of the hour of the upload date so we can group the results. Then count, get average, and set your 1 hour condition in the where.

    SELECT DATE_FORMAT(upload_date, '%Y-%m-%d %H:00:00') upload_date_hr
         , COUNT(*) countOfDates
         , CAST(AVG(TIMEDIFF(updated_date, upload_date)) AS TIME) avgDuration
      FROM data d
     WHERE TIMEDIFF(updated_date, upload_date) >= '01:00:00'
     GROUP BY upload_date_hr
    

    This returns:

    upload_date_hr countOfDates avgDuration
    2023-04-03 09:00:00 1 01:08:59
    2023-04-03 10:00:00 1 01:25:59

    FIDDLE

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search