I have a table that consists of vehicle plates, where all the plates have always 7 characters.
Sometimes a plate was miswritten and so my SELECT query must be able to consider that "AAU1234" and "AAV1234" must be the same vehicle.
I don’t want to create rules between specific characters, like "U" and "V" or "I" and "1", but a rule able to group plates where 6 of the 7 characters are equal and in the same position.
In this case, ids 1, 2 and 5 should appear only once.
It would be great if, when grouped, all the grouped plates were informed, concatened in another column.
More information: This is a big table and currently identical plates are grouped when inside the same date-time range of 15 minutes.
My query looks like this:
select plate, floor(unix_timestamp(date)/(15 * 60)) as timekey
from table
group by plate, timekey
order by date desc
Following the image example above, my goal is to group ids 1, 2 and 5 in the same row, since id 1 and id 2 matches 6 of 7 strings and so id 1 and id 5.
The result could be something like:
Or:
The final date information is not important here, the most important thing here is to group the similar plates.
2
Answers
You could use
REGEXP_REPLACE()
to replace an initialAAV
withAAU
when grouping.See example
with test data. The data is somewhat expanded compared to the examples in the question.
First, let’s simple compare
plate
‘s and find the rows, where 6 characters match.matchn=6
Output
This table can be considered as a description of a directed graph.
The resulting graph is oriented because we have set the condition (t1.id <t2.id ).
Then we recursively traverse the directed graph to find all nodes starting from the vertex of the graph.
Now we can merge the rows or just assign a group to each row.
Output
New groups for rows
Demo