skip to Main Content

As the subject states, I have a table with multiple rows per user and I need to get a count of all rows for each user with the data from the row with the oldest date AND then insert one new row, with the count value, into another table.

I am importing an attendance list CSV file into a temporary table…which is working fine. But now I need to process all of the individual records into a single summary record to be added to the final table.

tempTable:
id   email      tDate        cValue  col4    col5   col6
==========================================================
1    [email protected]    2021-01-01    1      foo     bar    foobar
2    [email protected]    2021-01-02    1      bar     foo    barfoo      
3    [email protected]    2021-02-01    1      foo     bar    foobar
4    [email protected]    2021-01-15    1      bah     hab    bahhab
5    [email protected]    2021-02-15    1      hab     bah    habbah
5    [email protected]    2021-03-01    1      bar     foo    barfoo
6    [email protected]    2021-04-01    1      foo     bar    foobar
7    [email protected]    2021-03-01    1      hab     bah    habbah

newTable (with newest date)
id   email      tDate        cValue  col4    col5   col6
==========================================================
1    [email protected]    2021-04-01    3      foo     bar    foobar
2    [email protected]    2021-03-01    2      bar     foo    barfoo
3    [email protected]    2021-01-15    1      bah     hab    bahhab
4    [email protected]    2021-03-01    2      hab     bah    habbah

I think the below works (I have tested the select part, but not the full insert yet), but I don’t know how to process the GROUP BY email based on tDate being the oldest or newest. I haven’t decided yet where the oldest or newest data should be the final record – but I still need to know how to grab by the date.

INSERT INTO newTable (email,tDate,cValue,col4,col5,col6) 
SELECT 
   email,
   tDate,
   COUNT(*) as  tValue,
   col4,
   col5,
   col6 
FROM tempTable 
GROUP BY email ;

When I do an ORDER BY tDate DESC – it is simply ordering the output, not actually ordering the records before the GROUP BY.

2

Answers


  1. Chosen as BEST ANSWER

    Kudos to both @nnichols and @Luuk for clueing me into the use of MIN/MAX in the query. I should have thought of it myself but...well, didn't. lol.

    Just using the MIN or MAX gave me what I was looking for....but with a caveat. See caveat noted below. In my case though, I don't have to worry about the caveat so simply doing the MIN/MAX in my current query is all I needed.

    INSERT INTO newTable (email,tDate,cValue,col4,col5,col6) 
    SELECT 
       email,
       MAX(tDate),
       COUNT(*) as  tValue,
       col4,
       col5,
       col6 
    FROM tempTable 
    GROUP BY email ;
    

    Outputted:

    newTable (with newest date)
    id   email      tDate        cValue  col4    col5   col6
    ==========================================================
    1    [email protected]    2021-04-01    3      foo     bar    foobar
    2    [email protected]    2021-03-01    2      bar     foo    barfoo
    3    [email protected]    2021-01-15    1      bah     hab    bahhab
    4    [email protected]    2021-03-01    2      hab     bah    habbah
    

    Or with MIN:

    newTable (with newest date)
    id   email      tDate        cValue  col4    col5   col6
    ==========================================================
    1    [email protected]    2021-01-01    3      foo     bar    foobar
    2    [email protected]    2021-01-02    2      bar     foo    barfoo
    3    [email protected]    2021-01-15    1      bah     hab    bahhab
    4    [email protected]    2021-02-15    2      hab     bah    habbah
    

    CAVEAT: the MAX(tDate) being selected does not mean the rest of the columns being selected are also from the same row. I saw instances where if col4 for the same email address had different values, that the MIN/MAX(tDate) returned the same col4 value even though the returned tDate was different each time.

    If I truly needed all the data from the same tDate row (MIN or MAX), then I would need to do query more like @nnichols answered with.


  2. As Luuk has already explained, this is a non-deterministic query and you should read about MySQL Handling of GROUP BY. ONLY_FULL_GROUP_BY is a good thing and should be enabled.

    The idea is to get the MAX tDate and COUNT per email and then use that to join back to the original table values –

    SELECT t1.email, t1.tDate, t2.cnt AS cValue, t1.col4, t1.col5, t1.col6
    FROM tempTable t1
    JOIN (
        SELECT email, MAX(tDate) AS maxDate, COUNT(*) as cnt
        FROM tempTable
        GROUP BY email
    ) t2 ON t1.email = t2.email AND t1.tDate = t2.maxDate;
    

    Alternatively, if using MySQL 8 you can use window functions, specifically ROW_NUMBER() to assign row numbers per partition ordered by tDate descending and then select where row_number = 1

    SELECT email, tDate, cValue, col4, col5, col6
    FROM (
        SELECT
            email,
            tDate,
            COUNT(*) OVER (PARTITION BY email) AS cValue,
            col4,
            col5,
            col6 ,
            ROW_NUMBER() OVER (PARTITION BY email ORDER BY tDate DESC) AS rn
        FROM tempTable
    ) t
    WHERE t.rn = 1
    ORDER BY email ASC;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search