skip to Main Content

I have a table with column that contain comma separated ids like :

ids 
----
1,2,3,4
1,4,5
1,5
2
2,6,9

Now I need to sort these ids based on a given string similarity (common elements). For example if the string be 1,5 the result I need is :

1,5 (exactly the same)
1,4,5 (has 1,5 but also has an extra number)
1,2,3,4 (has only 1)
2 (no matter)
2,6,9 (no matter)

My question is that MySQL has a built-in function to reach above result or I have to write a custom procedure? I tried match-against syntax but result was not acceptable.

Note:

In fact these numbers are tag id and I want to find the most similar product.

Thank you in advance

2

Answers


  1. It is not the right approach to store comma-separate IDs, but I solved a similar problem in the past with this function:

    It will score each match of the IDs(that worked for me, you might need to add some logic.

    DELIMITER //
    CREATE FUNCTION similarity_score(input_string VARCHAR(255), tags_string VARCHAR(255)) RETURNS INT
    BEGIN
        DECLARE score INT DEFAULT 0;
        DECLARE tag VARCHAR(255);
        DECLARE remainder VARCHAR(255);
        
        SET remainder = input_string;
    
        WHILE LENGTH(remainder) > 0 DO
            IF LOCATE(',', remainder) > 0 THEN
                SET tag = SUBSTRING(remainder, 1, LOCATE(',', remainder) - 1);
                SET remainder = SUBSTRING(remainder, LOCATE(',', remainder) + 1);
            ELSE
                SET tag = remainder;
                SET remainder = '';
            END IF;
    
            IF FIND_IN_SET(tag, tags_string) THEN
                SET score = score + 1;
            END IF;
        END WHILE;
    
        RETURN score;
    END //
    DELIMITER ;
    
    -- Usage
    SET @input_tags = '1,5';
    
    SELECT ids 
    FROM your_table
    ORDER BY similarity_score(@input_tags, ids) DESC, LENGTH(ids), ids;
    
    Login or Signup to reply.
  2. To find the count of matching numbers you can do:

    SET @compare = '1,5';
    WITH cte as (
      SELECT
          r as m,
          a as x, 
          SUBSTRING_INDEX(SUBSTRING_INDEX(ids,',',a),',',-1) as s,
          ids
      FROM (select row_number() over (order by ids) as r, ids from mytable )m
      CROSS JOIN (select 1 as a union all select 2 union all select 3 union all select 4) b
      WHERE b.a <= length(m.ids)-length(replace(m.ids,',',''))+1
    )
    SELECT 
      ids, COUNT(*) as matches
    FROM cte 
    WHERE LOCATE(s,@compare)<>0
    GROUP BY ids
    ;
    

    output

    ids matches
    1,2,3,4 1
    1,4,5 2
    1,5 2

    see: DBFIDDLE

    When putting more time in this you should be able to give 1,5 a higher ranking than 1,4,5.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search