How can I group results where string has the same 6 characters from a total of 7 in MySQL/MariaDB?

RafaelZG
July 9, 2024
80 views
1 vote
2 Answers

I have a table that consists of vehicle plates, where all the plates have always 7 characters.
Sometimes a plate was miswritten and so my SELECT query must be able to consider that "AAU1234" and "AAV1234" must be the same vehicle.
I don’t want to create rules between specific characters, like "U" and "V" or "I" and "1", but a rule able to group plates where 6 of the 7 characters are equal and in the same position.

eg.:

In this case, ids 1, 2 and 5 should appear only once.

It would be great if, when grouped, all the grouped plates were informed, concatened in another column.

More information: This is a big table and currently identical plates are grouped when inside the same date-time range of 15 minutes.

My query looks like this:

select plate, floor(unix_timestamp(date)/(15 * 60)) as timekey 
from table 
group by plate, timekey 
order by date desc

Following the image example above, my goal is to group ids 1, 2 and 5 in the same row, since id 1 and id 2 matches 6 of 7 strings and so id 1 and id 5.

The result could be something like:

Or:

The final date information is not important here, the most important thing here is to group the similar plates.

Tags: mariadb mysql

Answers

- Barmar
- July 9, 2024 at 11:13 pm
- 0 votes
0
You could use REGEXP_REPLACE() to replace an initial AAV with AAU when grouping.
```
select 
    REGEXP_REPLACE(plate, '^AAV', 'AAU') AS fixed_plate, 
    FROM_UNIXTIME(floor(unix_timestamp(date)/(15 * 60)), * 15 * 60) as timekey, 
    GROUP_CONCAT(id) AS concat_ids, 
    GROUP_CONCAT(plate) AS concat_plates
from table 
group by fixed_plate, timekey 
order by timekey desc
```
Login or Signup to reply.

See example
with test data. The data is somewhat expanded compared to the examples in the question.

create table test (id int, plate varchar(7));
insert into test values
 (1,'AAU1234')
,(2,'AAV1234')
,(3,'BKP5678')
,(4,'CMD9081')
,(5,'A4U1234')
,(6,'ABC1234')
,(7,'ABG1234')
,(8,'ABO1234')
,(9,'ABOI234')
,(10,'ABOI284')
,(11,'ABGI234')
,(12,'ABGI284')
,(14,'CMD9031')
;

First, let’s simple compare plate‘s and find the rows, where 6 characters match.
matchn=6

with recursive
tn as(
  select *,substring(plate,1,1) s1,substring(plate,2,1) s2,substring(plate,3,1) s3
    ,substring(plate,4,1) s4,substring(plate,5,1) s5,substring(plate,6,1) s6,substring(plate,7,1) s7
  from test
)
,cmp as(
select t1.id,t1.plate,t2.id id2,t2.plate plate2
  ,(t1.s1=t2.s1)+(t1.s2=t2.s2)+(t1.s3=t2.s3)+(t1.s4=t2.s4)+(t1.s5=t2.s5)
  +(t1.s6=t2.s6)+(t1.s7=t2.s7) matchn
from tn t1
left join tn t2 on t1.id<t2.id -- t1.plate>t2.plate -- and t1.s1=t2.s1
)

Output

id	plate	id2	plate2	matchn
1	AAU1234	2	AAV1234	6
1	AAU1234	5	A4U1234	6
4	CMD9081	14	CMD9031	6
6	ABC1234	7	ABG1234	6
6	ABC1234	8	ABO1234	6
7	ABG1234	8	ABO1234	6
7	ABG1234	11	ABGI234	6
8	ABO1234	9	ABOI234	6
9	ABOI234	10	ABOI284	6
9	ABOI234	11	ABGI234	6
10	ABOI284	12	ABGI284	6
11	ABGI234	12	ABGI284	6

This table can be considered as a description of a directed graph.
The resulting graph is oriented because we have set the condition (t1.id <t2.id ).

       (6)ABC1234
        /       
      C->G      C->O
      /            
(7)ABG1234--G->O--(8)ABO1234
                         
                         1->I
                           
                      (9)ABOI234
                          /    
                        O->G   3->8
                        /        
                       / (10)ABOI284
                      /         
                     /          O->G
                    /               
             (11)ABGI234-3->8-(12)ABGI284

Then we recursively traverse the directed graph to find all nodes starting from the vertex of the graph.

,r as(
  select 0 lvl,id,plate p0,plate,plate2
    ,cast(plate as char(1000)) as path
  from cmp
  where  matchn=6 and -- the vertex of the graph
    id not in (select id2 from cmp where matchn=6)
  union all
  select lvl+1,r.id,p0,r.plate2,t.plate2
    ,concat(r.path,',',t.plate) as path
  from r inner join cmp t on t.plate=r.plate2
  where t.matchn=6 and  find_in_set(t.plate,r.path)=0 
    -- and lvl<4  -- for debug
)
,d as(
  select distinct id,p0,plate2
  from r
)

Now we can merge the rows or just assign a group to each row.

,newGr as( -- assign new group to rows
select t.id,t.plate,coalesce(p0,plate) grPlate
from test t
left join d on t.plate=d.plate2
)
-- or aggregate groups
select min(id) id,grPlate
  ,group_concat(id) ids
  ,group_concat(plate) plates
from newGr
group by grPlate

Output

id	grPlate	ids	plates
1	AAU1234	1,2,5	AAU1234,AAV1234,A4U1234
6	ABC1234	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
3	BKP5678	3	BKP5678
4	CMD9081	4,14	CMD9081,CMD9031

New groups for rows

id	plate	grPlate
1	AAU1234	AAU1234
2	AAV1234	AAU1234
3	BKP5678	BKP5678
4	CMD9081	CMD9081
5	A4U1234	AAU1234
6	ABC1234	ABC1234
7	ABG1234	ABC1234
8	ABO1234	ABC1234
9	ABOI234	ABC1234
10	ABOI284	ABC1234
11	ABGI234	ABC1234
12	ABGI284	ABC1234
14	CMD9031	CMD9081

Demo

Please signup or login to give your own answer.

Click here to cancel reply.