skip to Main Content

I need to get the initial state as well as the latest state from a MySQL database. This is over two tables:

customer

id name surname dob email telephone
10 Steve Bobbly 01-01-1970 [email protected] 0123456789
15 James Bond 01-01-1950 [email protected] 0101010999

audit_log

id entity_id property old_value new_value
1 10 name John Steve
2 10 email [email protected] [email protected]
3 10 telephone 0123456789

What I expect is output like this:

id name surname dob email telephone
10 Steve Bobbly 01-01-1970 [email protected] 0123456789
10_1 John Bobbly 01-01-1970 [email protected]
15 James Bond 01-01-1950 [email protected] 0101010999

I initially had a PHP script that runs through all the customer rows, and then matches them to the audit_log rows and generate output from there, but the speed is EXTREMELY slow and resource intensive.

Would something like this be possible directly in MySQL, and how would I do it?

EDIT

I’ve added additional rows to the customer and the output tables. The output table needs to contain all rows in customer, as well as a copy of the initial row, built from audit_log.

4

Answers


  1. Considering there’s at most one update per each entityproperty pairs per each entity_id value, you can use this SQL Select statement

    SELECT a.entity_id, a.name, c.surname, c.dob, a.email, a.telephone
      FROM customer AS c
     CROSS JOIN ( SELECT entity_id, 
                         MAX(CASE WHEN property = 'name' THEN new_value END) AS name,
                         MAX(CASE WHEN property = 'email' THEN new_value END) AS email,
                         MAX(CASE WHEN property = 'telephone' THEN new_value END) AS telephone
                    FROM audit_log 
                   GROUP BY entity_id
                   UNION ALL
                  SELECT CONCAT(entity_id,'_1'), 
                         MAX(CASE WHEN property = 'name' THEN old_value END) AS name,
                         MAX(CASE WHEN property = 'email' THEN old_value END) AS email,
                         MAX(CASE WHEN property = 'telephone' THEN old_value END) AS telephone
                    FROM audit_log 
                   GROUP BY entity_id ) AS a
      WHERE a.entity_id = c.id
    

    where changing properties which take place in audit_log table with old vs. new values are unpivoted, and the others(non-changing ones) taken from customer table

    Demo

    Login or Signup to reply.
  2. Nice question! 🙂 Here is one way if you don’t mind some repetition:

    SELECT *
    FROM customer
    UNION ALL
    SELECT
      CONCAT(id, '_1') AS id,
      COALESCE(
        (SELECT old_value
         FROM audit_log
         WHERE id = (
           SELECT MIN(id)
           FROM audit_log WHERE entity_id = c.id AND property = 'name')
        ),
        c.name)
      AS name,
      COALESCE(
        (SELECT old_value
         FROM audit_log
         WHERE id = (
           SELECT MIN(id)
           FROM audit_log WHERE entity_id = c.id AND property = 'surname')
        ),
        c.surname)
      AS surname,
      COALESCE(
        (SELECT old_value
         FROM audit_log
         WHERE id = (
           SELECT MIN(id)
           FROM audit_log WHERE entity_id = c.id AND property = 'dob')
        ),
        c.dob)
      AS dob,
      COALESCE(
        (SELECT old_value
         FROM audit_log
         WHERE id = (
           SELECT MIN(id)
           FROM audit_log WHERE entity_id = c.id AND property = 'email')
        ),
        c.email)
      AS email,
      COALESCE(
        (SELECT old_value
         FROM audit_log
         WHERE id = (
           SELECT MIN(id)
           FROM audit_log WHERE entity_id = c.id AND property = 'telephone')
        ),
        c.telephone)
      AS telephone
    FROM customer c
    WHERE EXISTS (SELECT * FROM audit_log WHERE entity_id = c.id)
    

    Demo (shamelessly borrowing from the one set up by Barbaros Özhan)

    Note: The above will include the _1 entries whenever there are audit entries. But just the presence of audit entries doesn’t guarantee anything has changed – e.g. if the surname was changed from "Smith" to "Jones" and then back to "Smith". If this is important to you I may be able to modify it, at the expense of more complexity.

    Login or Signup to reply.
  3. Try the following:

    SET @rn=0;
    SET @cust=null;
    SET @pr=null;
    
    SELECT id, name, surname, dob, email, telephone
    FROM customer 
      UNION ALL
    (
      SELECT CONCAT(T.id, '_', D.rownum) id,
            COALESCE(MAX(CASE D.property WHEN 'name' THEN D.old_value END), MAX(T.name)) name,
            COALESCE(MAX(CASE D.property WHEN 'surname' THEN D.old_value END), MAX(T.surname)) surname,
            COALESCE(MAX(CASE D.property WHEN 'dob' THEN D.old_value END), MAX(T.dob)) dob,
            COALESCE(MAX(CASE D.property WHEN 'email' THEN D.old_value END), MAX(T.email)) email,
            COALESCE(MAX(CASE D.property WHEN 'telephone' THEN D.old_value END), MAX(T.telephone)) telephone
      FROM customer T
      JOIN 
      (
        SELECT id, entity_id, property, old_value, new_value, 
        IF(@cust <> entity_id OR @pr <> property, @rn:=1, @rn:=@rn+1) rownum,
        @cust:=entity_id, @pr:=property
        FROM audit_log 
        ORDER BY entity_id, property, id
      ) D
      ON T.ID = D.entity_id
      WHERE D.rownum=1
      GROUP BY T.id, D.rownum
    )
    ORDER BY id
    

    See a demo.

    This query simulates ROW_NUMBER() OVER (PARTITION BY entity_id, property ORDER BY id) for the audit_log table to get the initial value for each customer/ property (where rownum = 1) .

    The COALESCE is used to get the value of a property from the customers table, if this property is not changed, i.e. the name is changed but the email is not, then for email get the lastest value (in this case the latest=initial, which is T.email in this query).

    Login or Signup to reply.
  4. Sometimes a tough programming task is best handled by stepping back and rethinking the framework.

    I would re-think the schema design. Instead of this complex query, I would have 3 tables, making the query ‘trivial’:

    • Original — The values when the person is first put into the database.
    • Audit — The blow-by-blow — a historical record of all the changes. (Optionally like you have now. Or possibly a copy of the Current row when the change occurred.)
    • Current – The latest values.

    Then the query is essentially a UNION of Original and Audit.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search