I have a MySQL table temp_employee as follows:
CREATE TABLE `temp_employee` (
`employee_id` varchar(22) COLLATE utf8mb4_unicode_ci NOT NULL,
`application_date` datetime NOT NULL,
PRIMARY KEY (`employee_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
And data in the table as follows:
employee_id | application_date |
---|---|
emp.10062808068 | 2023-03-30 07:09:31 |
emp.10036984442 | 2023-03-30 09:48:30 |
emp.10041499192 | 2023-03-30 09:58:55 |
emp.10012314367 | 2023-03-30 03:33:28 |
Insert query for temp_employee table:
INSERT INTO `temp_employee` (`employee_id`, `application_date`)
VALUES
('emp.10012314367', '2023-03-30 03:33:28'),
('emp.10036984442', '2023-03-30 09:48:30'),
('emp.10041499192', '2023-03-30 09:58:55'),
('emp.10062808068', '2023-03-30 07:09:31');
There is another table employee_logs as follows:
CREATE TABLE `employee_logs` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`employee_id` varchar(22) COLLATE utf8mb4_unicode_ci NOT NULL,
`type` tinyint(1) NOT NULL,
`status` tinyint(1) NOT NULL,
`updated_time` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The data in the employee_logs table as follows:
id | employee_id | type | status | updated_time |
---|---|---|---|---|
1 | emp.10000000001 | 1 | 2 | 2021-04-30 10:19:31 |
2 | emp.10000000002 | 2 | 2 | 2022-06-30 17:29:31 |
3 | emp.10062808068 | 1 | 2 | 2023-03-30 11:19:20 |
4 | emp.10000000003 | 2 | 3 | 2023-03-30 03:03:30 |
5 | emp.10000000004 | 1 | 2 | 2023-03-30 04:04:40 |
6 | emp.10012314367 | 2 | 1 | 2023-03-30 13:13:31 |
7 | emp.10062808068 | 2 | 3 | 2023-03-20 17:24:10 |
8 | emp.10036984442 | 1 | 2 | 2023-03-30 09:09:09 |
9 | emp.10012314367 | 2 | 1 | 2023-03-20 10:10:10 |
10 | emp.10012314367 | 2 | 2 | 2023-03-21 12:12:12 |
Insert query for employee_logs table:
INSERT INTO `employee_logs` (`id`, `employee_id`, `type`, `status`, `updated_time`)
VALUES
(1, 'emp.10000000001', 1, 2, '2021-04-30 10:19:31'),
(2, 'emp.10000000002', 2, 2, '2022-06-30 17:29:31'),
(3, 'emp.10062808068', 1, 2, '2023-03-30 11:19:20'),
(4, 'emp.10000000003', 2, 3, '2023-03-30 03:03:30'),
(5, 'emp.10000000004', 1, 2, '2023-03-30 04:04:40'),
(6, 'emp.10012314367', 2, 1, '2023-03-30 13:13:31'),
(7, 'emp.10062808068', 2, 3, '2023-03-20 17:24:10'),
(8, 'emp.10036984442', 1, 2, '2023-03-30 09:09:09'),
(9, 'emp.10012314367', 2, 1, '2023-03-20 10:10:10'),
(10, 'emp.10012314367', 2, 2, '2023-03-21 12:12:12');
I want Mysql query to get type, status & updated_time for each employee_id in temp_employee table which select single row from employee_logs table with condition status != 3 AND updated_time <= application_date.
Expected Result (No. of rows same as the count of temp_employee table)
employee_id | type | status | updated_time |
---|---|---|---|
emp.10062808068 | NULL | NULL | NULL |
emp.10036984442 | 1 | 2 | 2023-03-30 09:09:09 |
emp.10041499192 | NULL | NULL | NULL |
emp.10012314367 | 2 | 2 | 2023-03-21 12:12:12 |
employee_logs table is huge with more than 40 million rows and can have multiple rows with the same employee_id.
Explanation on the employee_logs table which rows should picked by the condition (status <> 3 AND updated_time <= application_date), as follows:
1 emp.10000000001 1 2 2021-04-30 10:19:31 -- should not as emp_id not exists in temp table
2 emp.10000000002 2 2 2022-06-30 17:29:31 -- should not as emp_id not exists in temp table
3 emp.10062808068 1 2 2023-03-30 11:19:20 -- should not return as updated_time > application_date 2023-03-30 07:09:31
4 emp.10000000003 2 3 2023-03-30 03:03:30 -- should not as emp_id not exists in temp table
5 emp.10000000004 1 2 2023-03-30 04:04:40 -- should not as emp_id not exists in temp table
6 emp.10012314367 2 1 2023-03-30 13:13:31 -- should not return as updated_time > application_date 2023-03-30 03:33:28
7 emp.10062808068 2 3 2023-03-20 17:24:10 -- should not return as status = 3
8 emp.10036984442 1 2 2023-03-30 09:09:09 -- should return as updated_time <= application_date && status != 3
9 emp.10012314367 2 1 2023-03-20 10:10:10 -- should not return as updated_time < application_date 2023-03-30 03:33:28 but not latest
10 emp.10012314367 2 2 2023-03-21 12:12:12 -- should return as updated_time < application_date 2023-03-30 03:33:28
Any help in the query to achieve the desired output will be highly appreciated. TIA
3
Answers
The following query worked for me, the application_date column value in the temp_employee table was incorrect, after the fix the query return expected result.
SQL Fiddle link solution
I assume this is a temporary required query, to check if employees have agreed to some changes published (or sth. the like)
So, performance shouldn’t be all to critical.
As a first step, I would use this query to determine the
most recent
status values per employee. Join theemp_log
table on itself, with conditionL.emp_id = R.emp_id and L.update_date < R.update_date
– and pick the results, where the right set isnull
(means, there is no further record with a more recent update-state for this employee):http://sqlfiddle.com/#!9/7a95da/2
Now, you got the most recent updates – just missing the employees that have no updates -> right join them in.
http://sqlfiddle.com/#!9/7a95da/10
On the final result, you can apply your where-conditions as required. (
WHERE temp.status <> 3 and temp.update_time <= e.application_date)
– I left this out, cause it doesn’t make sence to me, that there are update-times BEFORE the employee even applied…You can achieve the desired output by modifying your query to use a subquery with a conditional aggregation to select the appropriate rows from the employee_logs table. Here’s the query to get the type, status, and updated_time for each employee_id in the temp_employee table based on your conditions:
http://sqlfiddle.com/#!9/e73ea8/6