skip to Main Content

I have a table called uk that contains a field called EvName. EvName is mediumtext format and contains entries with the following syntax:

Rochdale 1st Jan – 12:20 2 hours 30 min

I want to extract the time from EvName for each record and return it in a new field called StartTime. I am new to MySQL but have come up with the following. It seems quite clunky and inefficient. For example, it takes a few minutes to run on a table that has c50,000 records.

Is there a better approach?

Thanks

SELECT
    *,
    CONCAT(SUBSTRING_INDEX(SUBSTRING_INDEX(EvName, ':', 1), ' ', -1), ":", SUBSTRING_INDEX(SUBSTRING_INDEX(EvName, ':', -1), ' ', 1)) AS StartTime
FROM uk;

2

Answers


  1. Strings manipulation is always costly, so you should try to avoid it and enter the data as needed.

    You can try as Barmar already wrote REGEXP_SUBSTR

    select SUBSTRING_INDEX(TRIM(SUBSTRING_INDEX('Rochdale 1st Jan - 12:20 2 hours 30 min', '-',-1)),' ',1)
    
    SUBSTRING_INDEX(TRIM(SUBSTRING_INDEX(‘Rochdale 1st Jan – 12:20 2 hours 30 min’, ‘-‘,-1)),’ ‘,1)
    12:20
    SELECT REGEXP_SUBSTR('Rochdale 1st Jan - 12:20 2 hours 30 min','[0-9]{2}:[0-9]{2}')
    
    REGEXP_SUBSTR(‘Rochdale 1st Jan – 12:20 2 hours 30 min’,'[0-9]{2}:[0-9]{2}’)
    12:20

    fiddle

    Login or Signup to reply.
  2. If you want to avoid the overhead of the string-splitting expression, then split it when you insert the row, and put the time into its own column.

    That said, here’s a way to measure which string expression is faster:

    mysql> SET @EvName='Rochdale 1st Jan - 12:20 2 hours 30 min';
    
    mysql> SELECT BENCHMARK(10000000, CONCAT(SUBSTRING_INDEX(SUBSTRING_INDEX(@EvName, ':', 1), ' ', -1), ":", SUBSTRING_INDEX(SUBSTRING_INDEX(@EvName, ':', -1), ' ', 1))) AS Benchmark;
    +-----------+
    | Benchmark |
    +-----------+
    |         0 |
    +-----------+
    1 row in set (2.67 sec)
    
    mysql> SELECT BENCHMARK(10000000, REGEXP_SUBSTR(@EvName, '[0-9]{2}:[0-9]{2}')) AS Benchmark;
    +-----------+
    | Benchmark |
    +-----------+
    |         0 |
    +-----------+
    1 row in set (2.93 sec)
    

    So the regular expression solution is slightly slower, but I had to execute both of them 10 million times to get a duration that is large enough to be compared.

    I ran this test on a Macbook Pro M1 2020. If it takes you several minutes to do the same test on your server with only 50,000 rows, then you need to upgrade your server.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search