Comma separated string similarity check in MySQL

minajavid
August 8, 2023
277 views
1 vote
2 Answers

I have a table with column that contain comma separated ids like :

ids 
----
1,2,3,4
1,4,5
1,5
2
2,6,9

Now I need to sort these ids based on a given string similarity (common elements). For example if the string be 1,5 the result I need is :

1,5 (exactly the same)
1,4,5 (has 1,5 but also has an extra number)
1,2,3,4 (has only 1)
2 (no matter)
2,6,9 (no matter)

My question is that MySQL has a built-in function to reach above result or I have to write a custom procedure? I tried match-against syntax but result was not acceptable.

Note:

In fact these numbers are tag id and I want to find the most similar product.

Thank you in advance

Answers

It is not the right approach to store comma-separate IDs, but I solved a similar problem in the past with this function:

It will score each match of the IDs(that worked for me, you might need to add some logic.

DELIMITER //
CREATE FUNCTION similarity_score(input_string VARCHAR(255), tags_string VARCHAR(255)) RETURNS INT
BEGIN
    DECLARE score INT DEFAULT 0;
    DECLARE tag VARCHAR(255);
    DECLARE remainder VARCHAR(255);
    
    SET remainder = input_string;

    WHILE LENGTH(remainder) > 0 DO
        IF LOCATE(',', remainder) > 0 THEN
            SET tag = SUBSTRING(remainder, 1, LOCATE(',', remainder) - 1);
            SET remainder = SUBSTRING(remainder, LOCATE(',', remainder) + 1);
        ELSE
            SET tag = remainder;
            SET remainder = '';
        END IF;

        IF FIND_IN_SET(tag, tags_string) THEN
            SET score = score + 1;
        END IF;
    END WHILE;

    RETURN score;
END //
DELIMITER ;

-- Usage
SET @input_tags = '1,5';

SELECT ids 
FROM your_table
ORDER BY similarity_score(@input_tags, ids) DESC, LENGTH(ids), ids;

To find the count of matching numbers you can do:

SET @compare = '1,5';
WITH cte as (
  SELECT
      r as m,
      a as x, 
      SUBSTRING_INDEX(SUBSTRING_INDEX(ids,',',a),',',-1) as s,
      ids
  FROM (select row_number() over (order by ids) as r, ids from mytable )m
  CROSS JOIN (select 1 as a union all select 2 union all select 3 union all select 4) b
  WHERE b.a <= length(m.ids)-length(replace(m.ids,',',''))+1
)
SELECT 
  ids, COUNT(*) as matches
FROM cte 
WHERE LOCATE(s,@compare)<>0
GROUP BY ids
;

output

ids	matches
1,2,3,4	1
1,4,5	2
1,5	2

see: DBFIDDLE

When putting more time in this you should be able to give 1,5 a higher ranking than 1,4,5.

Please signup or login to give your own answer.

Click here to cancel reply.