I have database like this (in real there are over 30 different sKey):
+----+------+------+---------------------+
|ID | sKey | sVal | timestamp |
+----+------+------+---------------------+
| 1 | temp | 19 | 2023-07-14 20:32:06 |
| 2 | humi | 60 | 2023-07-14 20:33:06 |
| 3 | temp | 20 | 2023-07-14 20:34:06 |
| 4 | humi | 65 | 2023-07-14 20:35:06 |
| 5 | pres | 1023 | 2023-07-14 20:36:06 |
| 6 | temp | 22 | 2023-07-14 20:37:06 |
| 7 | temp | 21 | 2023-07-14 20:38:06 |
| 8 | pres | 1028 | 2023-07-14 20:39:06 |
| 9 | temp | 20 | 2023-07-14 20:40:06 |
|10 | temp | 19 | 2023-07-14 20:43:06 | <-time glitch
|11 | pres | 1022 | 2023-07-14 20:44:06 |
|12 | temp | 19 | 2023-07-14 20:45:06 |
|13 | humi | 66 | 2023-07-14 20:46:06 |
|14 | humi | 63 | 2023-07-14 20:47:06 |
|15 | temp | 19 | 2023-07-14 20:48:06 |
|16 | pres | 1029 | 2023-07-14 20:49:06 |
|20 | temp | 19 | 2023-07-14 20:50:06 | <- ID not consecutive (deleted records)
|21 | pres | 1022 | 2023-07-14 20:61:06 |
|22 | temp | 19 | 2023-07-14 20:62:06 |
|23 | pres | 1029 | 2023-07-14 20:63:06 |
+----+------+------+---------------------+
Now I wish to get averages for each sKey (with value in sVal).
For now I have working solution but it’s to slow. Actually I have 3 separate queries like this:
SELECT AVG(`sVal`), `timestamp` FROM `Test` WHERE sKey='temp' AND timestamp between '2023-07-14 20:34:06' and '2023-07-14 20:51:06' GROUP BY FLOOR(TO_SECONDS(`timestamp`)/180)
SELECT AVG(`sVal`), `timestamp` FROM `Test` WHERE sKey='humi' AND timestamp between '2023-07-14 20:34:06' and '2023-07-14 20:51:06' GROUP BY FLOOR(TO_SECONDS(`timestamp`)/180)
SELECT AVG(`sVal`), `timestamp` FROM `Test` WHERE sKey='pres' AND timestamp between '2023-07-14 20:34:06' and '2023-07-14 20:51:06' GROUP BY FLOOR(TO_SECONDS(`timestamp`)/180)
As database is allready over 2000000 records now, the single query take around 3 seconds. I assume if I can somehow join query to just one as only WHERE sKey=… the result should be faster. So how to improve that.
Or maybe I have wrong approach at all. The wanted result is to get averaged sVal per each sKey. The averaging interval cant be by record numbers (ID) as some record may be deleted. Even if ID is in row there may be some record missed. So I think only interval in timestamp itself can be used. But I’m novice in SQL and I’m possible miss something.
Average time (3 minutes in example) may be any value.
2
Answers
You can use conditional avg.
thsi query will benefit from an Index on timestamo, also you should test if a combined index on skey and timestamp will also increase the speed
A complete table scan is needed because of the
GROUP BY FLOOR(TO_SECONDS(
timestamp) / 180)
.The change provided by nbk might also bring improvement because with that change your table needs to be scanned only once.
Another improvement can be made by adding a generated column
You can even add an index on that column:
Because of this extra info, some more diskspace is needed, but you can also change the WHERE-clause from:
to:
or (the less readable):