Let’s say I have a slow queries log like this:
# Time: 230706 17:12:48
# User@Host: sample[sample] @ localhost []
# Thread_id: 626784 Schema: sample QC_hit: No
# Query_time: 2.976557 Lock_time: 0.000178 Rows_sent: 0 Rows_examined: 3344231
# Rows_affected: 0 Bytes_sent: 195
SET timestamp=1688677968;
SELECT * from a;
# Time: 230706 17:15:51
# User@Host: root[root] @ localhost []
# Thread_id: 627770 Schema: sample QC_hit: No
# Query_time: 2.581676 Lock_time: 0.000270 Rows_sent: 0 Rows_examined: 2432228
# Rows_affected: 0 Bytes_sent: 195
SET timestamp=1688678151;
select * from cs;
# Time: 230706 17:13:37
# User@Host: sample[sample] @ localhost []
# Thread_id: 627027 Schema: oiemorug_wp598 QC_hit: No
# Query_time: 3.901325 Lock_time: 0.000145 Rows_sent: 0 Rows_examined: 3851050
# Rows_affected: 0 Bytes_sent: 195
SET timestamp=1688678017;
SELECT * from b
# Time: 230706 17:15:51
# User@Host: root[root] @ localhost []
# Thread_id: 627770 Schema: sample QC_hit: No
# Query_time: 2.581676 Lock_time: 0.000270 Rows_sent: 0 Rows_examined: 2432228
# Rows_affected: 0 Bytes_sent: 195
SET timestamp=1688678151;
select * from cs
How can I match the full records where the query was done by the root user?
In this case it would be the second and the last records. (User@Host: root[root])
I’ve tried several variations of these regex without too much success.
This one # Time.*?root.*?(?=# Time)
matches records not owned by the root user
This one # Time.*?root.*?(?!# Time)
doesn’t match correctly
The main ideas is to remove all records owned by the root user from the slow query log.
2
Answers
Use the following regex:
# Time(?:(?!User@Host:)[sS])*?User@Host: (?'user'root)[k'user'][sS]*?(?=(?:# Time|$))
Demo
Explanation:
Use a
negative lookahead
to assert that you don’t match the pattern that points to the username:(?!User@Host:)
The whole thing is then made non-capturing:
(?:(?!User@Host:)[sS])*?
Used a named capturing group for readability:
(?'user'root)
Matched the username with the captured named group:
[k'user']
The positive lookahead at the end uses the pipe (
|
) to assert either the start of the next record or the end of the record:(?=(?:# Time|$))
In comments there was suggested literal string comparison without regex, which I support, so just search for raw string
# User@Host: root[root] @ localhost []
What else I could suggest to use
s*
pattern instead of spaces to make sure whitespaces won’t affect the output (s*
matches zero or more whitespaces).Also I added
^
and$
anchors to match start and end of a line respectively, so finally, the pattern I would suggest:Demo