As the subject states, I have a table with multiple rows per user and I need to get a count of all rows for each user with the data from the row with the oldest date AND then insert one new row, with the count value, into another table.
I am importing an attendance list CSV file into a temporary table…which is working fine. But now I need to process all of the individual records into a single summary record to be added to the final table.
tempTable:
id email tDate cValue col4 col5 col6
==========================================================
1 [email protected] 2021-01-01 1 foo bar foobar
2 [email protected] 2021-01-02 1 bar foo barfoo
3 [email protected] 2021-02-01 1 foo bar foobar
4 [email protected] 2021-01-15 1 bah hab bahhab
5 [email protected] 2021-02-15 1 hab bah habbah
5 [email protected] 2021-03-01 1 bar foo barfoo
6 [email protected] 2021-04-01 1 foo bar foobar
7 [email protected] 2021-03-01 1 hab bah habbah
newTable (with newest date)
id email tDate cValue col4 col5 col6
==========================================================
1 [email protected] 2021-04-01 3 foo bar foobar
2 [email protected] 2021-03-01 2 bar foo barfoo
3 [email protected] 2021-01-15 1 bah hab bahhab
4 [email protected] 2021-03-01 2 hab bah habbah
I think the below works (I have tested the select
part, but not the full insert
yet), but I don’t know how to process the GROUP BY email
based on tDate being the oldest or newest. I haven’t decided yet where the oldest or newest data should be the final record – but I still need to know how to grab by the date.
INSERT INTO newTable (email,tDate,cValue,col4,col5,col6)
SELECT
email,
tDate,
COUNT(*) as tValue,
col4,
col5,
col6
FROM tempTable
GROUP BY email ;
When I do an ORDER BY tDate DESC
– it is simply ordering the output, not actually ordering the records before the GROUP BY.
2
Answers
Kudos to both @nnichols and @Luuk for clueing me into the use of MIN/MAX in the query. I should have thought of it myself but...well, didn't. lol.
Just using the
MIN
orMAX
gave me what I was looking for....but with a caveat. See caveat noted below. In my case though, I don't have to worry about the caveat so simply doing theMIN/MAX
in my current query is all I needed.Outputted:
Or with
MIN
:CAVEAT: the
MAX(tDate)
being selected does not mean the rest of the columns being selected are also from the same row. I saw instances where if col4 for the same email address had different values, that theMIN/MAX(tDate)
returned the same col4 value even though the returnedtDate
was different each time.If I truly needed all the data from the same tDate row (MIN or MAX), then I would need to do query more like @nnichols answered with.
As Luuk has already explained, this is a non-deterministic query and you should read about MySQL Handling of GROUP BY. ONLY_FULL_GROUP_BY is a good thing and should be enabled.
The idea is to get the MAX tDate and COUNT per email and then use that to join back to the original table values –
Alternatively, if using MySQL 8 you can use window functions, specifically ROW_NUMBER() to assign row numbers per partition ordered by tDate descending and then select where row_number = 1