skip to Main Content

I currently have a stored procedure that accepts TEXT as input, and then tries to find the matching ids within the set.

The input_ids value is an array of longs ‘1,2,3,…’.

It works but the problem is that find_in_set is slow, and I think there must be a better way.

CREATE PROCEDURE `sel_network_edges_by_id`(
    IN input_ids TEXT
)
BEGIN
    SELECT id, gradient, geoJson, ... 
    FROM network.network_edges 
    WHERE FIND_IN_SET(id, input_ids);
END

Table Sample

id, gradient, geoJson
1, -50, [34.34....]    
2, 110, [35.3345..]
3, 110, [35.3345..]

2

Answers


  1. You could use a temporary table and join it with the main table with this process:

    1. Parsing Input String:
      Convert the comma-separated input string into a temporary table that holds the IDs as separate rows. This can be done using a string splitting function:

      CREATE FUNCTION SPLIT_STRING(x VARCHAR(255), delim VARCHAR(12), pos INT) RETURNS VARCHAR(255)
      RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
                 LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
                 delim, '');
      

      Now you can use this function to split the input_ids and insert them into a temporary table:

      CREATE TEMPORARY TABLE temp_ids (id INT);
      
      DECLARE pos INT DEFAULT 1;
      DECLARE sub_id VARCHAR(255);
      
      WHILE pos <= LENGTH(input_ids) - LENGTH(REPLACE(input_ids, ',', '')) + 1 DO
          SET sub_id = SPLIT_STRING(input_ids, ',', pos);
          INSERT INTO temp_ids (id) VALUES (CAST(sub_id AS INT));
          SET pos = pos + 1;
      END WHILE;
      
    2. Query Using Join:
      Once you have the IDs in the temporary table, you can perform a more efficient query using a join:

      SELECT ne.id, ne.gradient, ne.geoJson, ...
      FROM network.network_edges ne
      JOIN temp_ids ti ON ne.id = ti.id;
      

    Although this solution is more efficient than FIND_IN_SET, it does involve more steps. However, it should perform better, especially for larger datasets, as it avoids string manipulations and allows for optimized indexing. Temporary tables have session scope, so they will be automatically dropped when the session ends.

    Login or Signup to reply.
  2. Since FIND_IN_SET can’t use an index you should use a prepared statement with IN(), where the values of the passed string can be used as an array inside IN() by concatenation.:

    CREATE PROCEDURE sel_network_edges_by_id(IN input_ids TEXT)
    BEGIN
      PREPARE my FROM CONCAT("SELECT id, gradient, geoJson, ... 
        FROM network.network_edges 
        WHERE id IN (,", input_ids,")");
      EXECUTE my;
      DEALLOCATE PREPARE my;
    END
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search