skip to Main Content

There are 4 tables:
trunk,
branch,
leaf and data. The Trunk table has an IDTrunk and TrunkNumber columns. The Branch table has an IDBranch, IDTrunk and BranchNumber columns. The leaf table has a IDLeaf, IDBranch and a LeafNumber columns. The Data table has an IDData, a TrunkNumber, a BranchNumber, a LeafNumber columns and a LastEditTime column. For each table, the IDx columns are the primary keys and the xNumber columns are numbers that are shown to the user. Please note that the data table holds the user shown number, not the TBL-ID (trunk, branch, leaf ID) primary keys. As the name implies, the trunk, branch and leaf tables have a parent-child relationship and an element can have multiple children but only a single parent. The child knows its parent, through the IDx column, but the parent doesn’t know its children directly. Data is only associated to a TBL-ID combination. Effectively, it means that only a leaf can have data. Data can also be associated with no leaf and the xNumber columns would contain null. Any given TBL-ID combination can only appear once.

It was noticed that when the Data table grows past 200k entries, recovering the latest Data entry for any given TBL-ID takes too much time. It was decided to add a LatestDataID column to the leaf table to speed this up. This new column would countain the IDData of the latest data entry for the associated leaf. If there is no data associated with a leaf, then it can hold -1, null or some other value that would make it obvious that it doesn’t have data. It is currently set to -1 but can be changed if that causes problem. Adding the new column and updating it when new data is pushed is simple enough. The issue is that it is not known how to update the LatestDataID column when the database structure is updated. Ideally, it would have to be done with a single query. This query will be executed only once when the database structure is updated to contain this new column.

As example :

Trunk Table

IDTrunk TrunkNumber
1 10
2 5

Branch Table

IDBranch IDTrunk BranchNumber
1 1 1
2 1 2

Leaf Table

IDLeaf IDBranch LeafNumber LatestDataID (new column to update)
1 1 5 1
2 1 6 4
3 1 7 -1
4 1 10 5
5 2 5 7
6 2 6 -1

Data Table

IDData TrunkNumber BranchNumber LeafNumber LastEditTime
1 10 1 5 9h50
2 10 1 5 8h50
3 10 1 6 7h00
4 10 1 6 7h30
5 10 1 10 12h00
6 null null null 10h00
7 10 2 5 10h00

Using this query :

SELECT trunk.TrunkNumber, branch.BranchNumber, leaf.LeafNumber  
    FROM leaf  
    INNER JOIN branch ON leaf.IDBranch = branch.IDBranch  
    INNER JOIN leaf ON branch.IDTrunk = trunk.IDTrunk;

The result is

TrunkNumber BranchNumber LeafNumber
10 1 5
10 1 6
10 1 7
10 1 10
10 2 5
10 2 6

What should be done to update Leaf.LatestDataID properly?

–EDIT–

LastEditTime is a proper timestamp. It was written like that in the example for the sake of brevity.

2

Answers


  1. I am not sure I understand your question, but I’m thinking an insert trigger on data or query in code that inserts data should work. This assumes that based on your text, that there will be only 1 trunk/branch/leaf combination for a given data record

    UPDATE LEAF l1 
     SET latestDataID = 
       (SELECT IDData FROM DATA d1 
         WHERE d1.LastEditTime = 
       (SELECT MAX(LastEditTime) FROM DATA d2 
          WHERE d2.TrunkNumber = d1.TrunkNumber 
          AND d2.BranchNumber = d1.BranchNumber 
          AND d2.LeafNumber= d1.LeafNumber)) 
     WHERE l1.IdLeaf= (SELECT IdLeaf FROM leaf L2
     WHERE l2.leafNumber = d1.leafnumber) 
        AND l1.idBranch = 
           (SELECT IdBranch FROM BRANCH 
            WHERE Branch.BranchNumber = d1.BranchNumber) 
        AND l1.IdTrunk = 
           (SELECT IdTrunk FROM TRUNK 
           WHERE TRUNK.TrunkNumber = d1.TrunkNumber)
    

    Holy correlated subquery, Batman!! I THINK this will work, depending on your requirement and my lack of typo’s. But I think you would be well served to add IdLeaf, IdBranch and IdTrunk to Data and create and index on them.

    I also assume that LastEditTime is an actual timestamp

    Login or Signup to reply.
  2. This assumes MySQL 8+. If you are still using 5.7 or earlier, it is way past time to upgrade.

    This will provide the desired update based on what I understand of your question:

    UPDATE leaf
    LEFT JOIN (
        SELECT l.IDLeaf, d.IDData,
            ROW_NUMBER() OVER (PARTITION BY l.IDLeaf ORDER BY d.LastEditTime DESC) AS rn
        FROM leaf l
        JOIN branch b
            ON l.IDBranch = b.IDBranch
        JOIN trunk t
            ON b.IDTrunk = t.IDTrunk
        JOIN data d
            ON l.LeafNumber = d.LeafNumber
            AND b.BranchNumber = d.BranchNumber
            AND t.TrunkNumber = d.TrunkNumber
    ) latest
        ON leaf.IDLeaf = latest.IDLeaf AND latest.rn = 1
    SET leaf.LatestDataID = COALESCE(latest.IDData, -1);
    

    Here’s a db<>fiddle.

    We do not know how your application is interacting with the db, but it seems likely that you would be much better off fixing your data model, as opposed to introducing redundant data while the dataset is so small.

    Replacing TrunkNumber, BranchNumber, LeafNumber in Data with just IDLeaf would seem like a much better option.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search