skip to Main Content

This question appears to be slightly different from some of the other questions out there dealing with duplicates on a LEFT JOIN. Here’s my situation:

I have a list of datapoints associated with a meter. The datapoints are contiguous but over time the meter name changes, and occasionally overlaps, so for example I have a meter table with something like this:

EDIT: for some reason my tables are working fine in edit mode, but not when live. I’ll try formatting as fixed space:

|MeterID| DataID|MeterName|ValidFrom |  ValidTo |
|-------|-------|---------|----------|----------|
|      1|      1|Meter A  |2010-09-21|2015-09-17|
|      2|      1|Meter B  |2015-09-15|2020-02-04|
|      3|      1|Meter C  |2016-05-02|2020-09-01|

My Data table simply has something like:

|DataID|Value|Timestamp          |
|------|-----|-------------------|
|     1| 0.9 |2010-09-21 00:00:00|
|     1| ... |...                |
|     1| 3.4 |2020-09-01 00:00:00|

What I’m looking for a is a result like this:

|Timestamp          |DataID|MeterName      |Value|
|-------------------|------|---------------|-----|
|…                  |     1|Meter A        |    …|
|2015-09-14 23:00:00|     1|Meter A        |  7.9|
|2015-09-15 00:00:00|     1|Meter A,Meter B|  3.0|
|…                  |     1|Meter A,Meter B|  6.3|
|2015-09-16 23:00:00|     1|Meter A,Meter B|  0.4|
|2015-09-17 00:00:00|     1|Meter B        |  7.5|
|…                  |     1|Meter B        |    …|
|2016-05-01 23:00:00|     1|Meter B        |  0.6|
|2016-05-02 00:00:00|     1|Meter B,Meter C|  2.0|
|…                  |     1|Meter B,Meter C|    …|
|2020-02-03 23:00:00|     1|Meter B,Meter C|  3.6|
|2020-02-04 00:00:00|     1|Meter C        |  9.7|
|…                  |     1|Meter C        |    …|

My existing query (below) works fine, but obviously results in duplicate rows where the meter timestamps overlap. In reality some of these overlaps can last years. If it helps, there will likely never be more than 2, maybe 3 overlaps at any given point. Also, we’re dealing with big tables here (over 1000 meters active at a time, and 20 years worth of hourly data) – so the meter table is currently 2200 rows and the data table is almost 190M rows – just for your consideration when thinking about query efficiency because right now my query below is very fast based on how the indexes are set up.

I thought about perhaps doing a subquery that transposes the meter names by timestamp value, similar to a PIVOT function in MSSQL, but I’m not sure the best way to achieve that in MySQL (v8.0.33)

SELECT
    d.Timestamp,
    d.DataID,
    m.MeterName,
    d.Value
FROM
    data d
LEFT JOIN
    meter m
ON
    m.DataID = d.DataID
    AND d.Timestamp >= m.ValidFrom
    AND d.Timestamp <= m.ValidTo
WHERE
    d.DataID=1

2

Answers


  1. This might be a good spot to use a lateral join (this requires MySQL 8.0.14 at least):

    SELECT d.Timestamp, d.DataID, m.MeterNames, d.Value
    FROM data d
    CROSS JOIN LATERAL (
        SELECT GROUP_CONCAT(m.MeterName ORDER BY m.ValidFrom) meterNames
        FROM meter m
        WHERE m.DataID = d.DataID
          AND d.Timestamp >= m.ValidFrom
          AND d.Timestamp <= m.ValidTo
    ) m
    WHERE d.DataID = 1
    

    In earlier versions, we can get the same behavior with a correlated subquery:

    SELECT d.Timestamp, d.DataID, 
        (
            SELECT GROUP_CONCAT(m.MeterName ORDER BY m.ValidFrom)
            FROM meter m
            WHERE m.DataID = d.DataID
              AND d.Timestamp >= m.ValidFrom
              AND d.Timestamp <= m.ValidTom
        ) MeterNames, 
        d.Value
    FROM data d
    WHERE d.DataID = 1
    

    I would expect this technique to be more efficient than left join + group by, since it avoids outer aggregation – especially if you have an index on meter (DataID, ValidFrom,ValidTom).

    Login or Signup to reply.
  2. Can you try with GROUP_CONCAT() function:

    SELECT
        d.Timestamp,
        d.DataID,
        GROUP_CONCAT(m.MeterName ORDER BY m.ValidFrom ASC SEPARATOR ',') AS MeterNames,
        d.Value
    FROM
        data d
    LEFT JOIN
        meter m
    ON
        m.DataID = d.DataID
        AND d.Timestamp >= m.ValidFrom
        AND d.Timestamp <= m.ValidTo
    WHERE
        d.DataID=1
    GROUP BY
        d.Timestamp, d.DataID, d.Value
    

    Let me know what you get..

    You can read more about GROUP_CONCAT() function here:

    https://www.geeksforgeeks.org/mysql-group_concat-function/

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search