skip to Main Content

I have a table that consists of vehicle plates, where all the plates have always 7 characters.
Sometimes a plate was miswritten and so my SELECT query must be able to consider that "AAU1234" and "AAV1234" must be the same vehicle.
I don’t want to create rules between specific characters, like "U" and "V" or "I" and "1", but a rule able to group plates where 6 of the 7 characters are equal and in the same position.

eg.: example of similar plates

In this case, ids 1, 2 and 5 should appear only once.

It would be great if, when grouped, all the grouped plates were informed, concatened in another column.

More information: This is a big table and currently identical plates are grouped when inside the same date-time range of 15 minutes.

My query looks like this:

select plate, floor(unix_timestamp(date)/(15 * 60)) as timekey 
from table 
group by plate, timekey 
order by date desc

Following the image example above, my goal is to group ids 1, 2 and 5 in the same row, since id 1 and id 2 matches 6 of 7 strings and so id 1 and id 5.

The result could be something like:

desirable result

Or:

desirable result

The final date information is not important here, the most important thing here is to group the similar plates.

2

Answers


  1. You could use REGEXP_REPLACE() to replace an initial AAV with AAU when grouping.

    select 
        REGEXP_REPLACE(plate, '^AAV', 'AAU') AS fixed_plate, 
        FROM_UNIXTIME(floor(unix_timestamp(date)/(15 * 60)), * 15 * 60) as timekey, 
        GROUP_CONCAT(id) AS concat_ids, 
        GROUP_CONCAT(plate) AS concat_plates
    from table 
    group by fixed_plate, timekey 
    order by timekey desc
    
    Login or Signup to reply.
  2. See example
    with test data. The data is somewhat expanded compared to the examples in the question.

    create table test (id int, plate varchar(7));
    insert into test values
     (1,'AAU1234')
    ,(2,'AAV1234')
    ,(3,'BKP5678')
    ,(4,'CMD9081')
    ,(5,'A4U1234')
    ,(6,'ABC1234')
    ,(7,'ABG1234')
    ,(8,'ABO1234')
    ,(9,'ABOI234')
    ,(10,'ABOI284')
    ,(11,'ABGI234')
    ,(12,'ABGI284')
    ,(14,'CMD9031')
    ;
    

    First, let’s simple compare plate‘s and find the rows, where 6 characters match.
    matchn=6

    with recursive
    tn as(
      select *,substring(plate,1,1) s1,substring(plate,2,1) s2,substring(plate,3,1) s3
        ,substring(plate,4,1) s4,substring(plate,5,1) s5,substring(plate,6,1) s6,substring(plate,7,1) s7
      from test
    )
    ,cmp as(
    select t1.id,t1.plate,t2.id id2,t2.plate plate2
      ,(t1.s1=t2.s1)+(t1.s2=t2.s2)+(t1.s3=t2.s3)+(t1.s4=t2.s4)+(t1.s5=t2.s5)
      +(t1.s6=t2.s6)+(t1.s7=t2.s7) matchn
    from tn t1
    left join tn t2 on t1.id<t2.id -- t1.plate>t2.plate -- and t1.s1=t2.s1
    )
    

    Output

    id plate id2 plate2 matchn
    1 AAU1234 2 AAV1234 6
    1 AAU1234 5 A4U1234 6
    4 CMD9081 14 CMD9031 6
    6 ABC1234 7 ABG1234 6
    6 ABC1234 8 ABO1234 6
    7 ABG1234 8 ABO1234 6
    7 ABG1234 11 ABGI234 6
    8 ABO1234 9 ABOI234 6
    9 ABOI234 10 ABOI284 6
    9 ABOI234 11 ABGI234 6
    10 ABOI284 12 ABGI284 6
    11 ABGI234 12 ABGI284 6

    This table can be considered as a description of a directed graph.
    The resulting graph is oriented because we have set the condition (t1.id <t2.id ).

           (6)ABC1234
            /       
          C->G      C->O
          /            
    (7)ABG1234--G->O--(8)ABO1234
                             
                             1->I
                               
                          (9)ABOI234
                              /    
                            O->G   3->8
                            /        
                           / (10)ABOI284
                          /         
                         /          O->G
                        /               
                 (11)ABGI234-3->8-(12)ABGI284
    
    

    Then we recursively traverse the directed graph to find all nodes starting from the vertex of the graph.

    ,r as(
      select 0 lvl,id,plate p0,plate,plate2
        ,cast(plate as char(1000)) as path
      from cmp
      where  matchn=6 and -- the vertex of the graph
        id not in (select id2 from cmp where matchn=6)
      union all
      select lvl+1,r.id,p0,r.plate2,t.plate2
        ,concat(r.path,',',t.plate) as path
      from r inner join cmp t on t.plate=r.plate2
      where t.matchn=6 and  find_in_set(t.plate,r.path)=0 
        -- and lvl<4  -- for debug
    )
    ,d as(
      select distinct id,p0,plate2
      from r
    )
    

    Now we can merge the rows or just assign a group to each row.

    ,newGr as( -- assign new group to rows
    select t.id,t.plate,coalesce(p0,plate) grPlate
    from test t
    left join d on t.plate=d.plate2
    )
    -- or aggregate groups
    select min(id) id,grPlate
      ,group_concat(id) ids
      ,group_concat(plate) plates
    from newGr
    group by grPlate
    

    Output

    id grPlate ids plates
    1 AAU1234 1,2,5 AAU1234,AAV1234,A4U1234
    6 ABC1234 6,7,8,9,10,11,12 ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
    3 BKP5678 3 BKP5678
    4 CMD9081 4,14 CMD9081,CMD9031

    New groups for rows

    id plate grPlate
    1 AAU1234 AAU1234
    2 AAV1234 AAU1234
    3 BKP5678 BKP5678
    4 CMD9081 CMD9081
    5 A4U1234 AAU1234
    6 ABC1234 ABC1234
    7 ABG1234 ABC1234
    8 ABO1234 ABC1234
    9 ABOI234 ABC1234
    10 ABOI284 ABC1234
    11 ABGI234 ABC1234
    12 ABGI284 ABC1234
    14 CMD9031 CMD9081

    Demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search