This question appears to be slightly different from some of the other questions out there dealing with duplicates on a LEFT JOIN. Here’s my situation:
I have a list of datapoints associated with a meter. The datapoints are contiguous but over time the meter name changes, and occasionally overlaps, so for example I have a meter table with something like this:
EDIT: for some reason my tables are working fine in edit mode, but not when live. I’ll try formatting as fixed space:
|MeterID| DataID|MeterName|ValidFrom | ValidTo |
|-------|-------|---------|----------|----------|
| 1| 1|Meter A |2010-09-21|2015-09-17|
| 2| 1|Meter B |2015-09-15|2020-02-04|
| 3| 1|Meter C |2016-05-02|2020-09-01|
My Data table simply has something like:
|DataID|Value|Timestamp |
|------|-----|-------------------|
| 1| 0.9 |2010-09-21 00:00:00|
| 1| ... |... |
| 1| 3.4 |2020-09-01 00:00:00|
What I’m looking for a is a result like this:
|Timestamp |DataID|MeterName |Value|
|-------------------|------|---------------|-----|
|… | 1|Meter A | …|
|2015-09-14 23:00:00| 1|Meter A | 7.9|
|2015-09-15 00:00:00| 1|Meter A,Meter B| 3.0|
|… | 1|Meter A,Meter B| 6.3|
|2015-09-16 23:00:00| 1|Meter A,Meter B| 0.4|
|2015-09-17 00:00:00| 1|Meter B | 7.5|
|… | 1|Meter B | …|
|2016-05-01 23:00:00| 1|Meter B | 0.6|
|2016-05-02 00:00:00| 1|Meter B,Meter C| 2.0|
|… | 1|Meter B,Meter C| …|
|2020-02-03 23:00:00| 1|Meter B,Meter C| 3.6|
|2020-02-04 00:00:00| 1|Meter C | 9.7|
|… | 1|Meter C | …|
My existing query (below) works fine, but obviously results in duplicate rows where the meter timestamps overlap. In reality some of these overlaps can last years. If it helps, there will likely never be more than 2, maybe 3 overlaps at any given point. Also, we’re dealing with big tables here (over 1000 meters active at a time, and 20 years worth of hourly data) – so the meter table is currently 2200 rows and the data table is almost 190M rows – just for your consideration when thinking about query efficiency because right now my query below is very fast based on how the indexes are set up.
I thought about perhaps doing a subquery that transposes the meter names by timestamp value, similar to a PIVOT function in MSSQL, but I’m not sure the best way to achieve that in MySQL (v8.0.33)
SELECT
d.Timestamp,
d.DataID,
m.MeterName,
d.Value
FROM
data d
LEFT JOIN
meter m
ON
m.DataID = d.DataID
AND d.Timestamp >= m.ValidFrom
AND d.Timestamp <= m.ValidTo
WHERE
d.DataID=1
2
Answers
This might be a good spot to use a lateral join (this requires MySQL 8.0.14 at least):
In earlier versions, we can get the same behavior with a correlated subquery:
I would expect this technique to be more efficient than
left join
+group by
, since it avoids outer aggregation – especially if you have an index onmeter (DataID, ValidFrom,ValidTom)
.Can you try with
GROUP_CONCAT()
function:Let me know what you get..
You can read more about
GROUP_CONCAT()
function here:https://www.geeksforgeeks.org/mysql-group_concat-function/