Optimize mysql query to retrieve time series data

micronyks
December 4, 2023
232 views
0 votes
2 Answers

I have following timeseries data table,

Id	Object_Id	Datetime	Sensor1	Sensor2
1	175	‘2022-03-24 22:01:00’	0	0
2	175	‘2022-03-24 22:02:00’	0	0
3	175	‘2022-03-24 22:03:00’	5.5699381666667	81.342836833333
4	175	‘2022-03-24 22:04:00’	5.5668366666667	81.281143
…	…	…	…	…
48	175	‘2022-03-24 22:48:00’	NULL	NULL
…	…	…	…	…

in above table, I have sensor1, sensor2 upto sensor1000 columns and each has millions of time series records.

Scenario1:

I want to retrieve starttime(min datetime) and endtime(max datetime) records for multiple sensors eg. sensor1 & sensor2 where their value IS NOT NULL.

I constructed below query for Sensor1,

(SELECT   
    datetime,id,sensor1
FROM   
    timeseries
WHERE  
    sensor1 IS NOT NULL
ORDER BY  
    datetime ASC limit 1)

Union ALL

(SELECT   
    datetime,id,sensor1
FROM   
    timeseries
WHERE  
    sensor1 IS NOT NULL
ORDER BY  
    datetime DESC limit 1)

I constructed below query for Sensor2,

(SELECT   
    datetime,id,sensor2
FROM   
    timeseries
WHERE  
    sensor2 IS NOT NULL
ORDER BY  
    datetime ASC limit 1)

Union ALL

(SELECT   
    datetime,id,sensor2
FROM   
    timeseries
WHERE  
    sensor2 IS NOT NULL
ORDER BY  
    datetime DESC limit 1)

This query works as expected. Pls observe that I created individual query for sensor1 and sensor2 as I want to handle failure scenario for individual sensor.

Problem is in real life scenario, I have millions of records, around 50 to 70 GB data.

When I run above query (for just sensor1 & sensor2), it takes significant amount of time to return the result. What if user is trying with multiple sensors. Trust me it takes time and we can observe it in real implementation.

In this DEMO it works smoothly as It has only 150 records.

I need help to improve/optimize the performance of this above query.

FYI, for each requested sensor, query is prepared in Node/express app and then app connects to MYSql server where I execute each query using for loop

const queryResults = await Promise.all(
      queries.map(async (query) => {
        return new Promise((resolve, reject) =>
          db.query(query, [], (err, result) => {
            if (err) {

              return resolve([]);                // If individual query fails then return empty array
            } else {
    
              return resolve(result);
            }
          })
        );
      })
    );

Answers

- RickJames
- December 4, 2023 at 11:15 pm
- 0 votes
0
This would run twice as fast (but have different output):
```
SELECT  datetime, id, sensor1, sensor2
    FROM  timeseries
    WHERE  sensor1 IS NOT NULL
       OR  sensor2 IS NOT NULL
    ORDER BY  datetime ASC
    limit  1
```
I suspect that your "sensor1" and "sensor2" could be any of 1…1000? If that is the case, there is no way to use "sensorN" in an index.

So, the best you can do with indexing (either for UNION or my suggestion) is INDEX(datetime). Slightly better would be to have datetime as the first column of the PRIMARY KEY, but I do not like that. I would prefer dropping id` and having
```
PRIMARY KEY(object_id, sensor_num, datetime)
```
and make sure that there are no duplicate datetime values. Note that DATETIME will have 60 minutes of dups once a year! — Consider using TIMESTAMP instead.

Only very rarely are the 16 digits of precision of CDOUBLE needed; suggest changing to FLOAT to cut the table size almost in half. (Double: 8 bytes; Float: 4 bytes, 7 significant digits).

If there are lots of NULLs and/or the number of sensors keeps growing, I strongly recommend switching to fewer columns and more rows:
```
object_id, sensor_num, timestamp, float_value
```
Don’t store NULLs. Do have the first 3 columns as the `PRIMARY KEY’.
Login or Signup to reply.

- OJones
- December 4, 2023 at 11:48 pm
- 0 votes
0
In addition to what Rick suggests, you might try adding these indexes to accelerate your original query examples.
```
CREATE INDEX s1dt ON timeseries (sensor1, datetime);
CREATE INDEX s2dt ON timeseries (sensor2, datetime);
```
They may help. They definitely generate index-employing execution plans on your test data set rather than table scans. https://dbfiddle.uk/XqE-lIvK

There are more elaborate solutions involving indexes on computed columns and that sort of thing. But try this first.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.