skip to Main Content

Having a table T1 contains

TITLE
age 5 alton john live
show must go on

Having a table T2 contains

NAME. DESCRIPTION
John Bo alton for kids
Alton show age kids

I would like to split TITLE (T1) and check if at list one token exists in DESCRIPTION (T2) and return TITLE and DESCRIPTION

Expected Output:

TITLE DESCRIPTION
age 5 alton john live. alton for kids
age 5 alton john live. show age kids
show must go on show age kids

2

Answers


  1. The following SQL query can be used to split the titles in T1 and check if at least one token exists in the descriptions in T2:

    SELECT T1.TITLE, T2.DESCRIPTION
    FROM T1
    JOIN T2
    ON T1.TITLE LIKE '%' || T2.DESCRIPTION || '%'
    

    The query uses a combination of the JOIN and LIKE operators to find the matching records between the tables T1 and T2.

    Login or Signup to reply.
  2. Main problem here is to wrangle table t1 to associate each "TITLE" value to each sequence of characters separated by space. Then you can match your wrangled table with your t2 table on a regex pattern. May require quite a bit of time as doing a join on such string condition is not very efficient, but it works.

    You can generate that association with a recursive query, that exploits SUBSTRING_INDEX to retrieve last word of "TITLE" and a numerical index that is decreased. Recursion would stop once that numerical index reaches 0.

    WITH RECURSIVE cte AS (
        SELECT TITLE, 
               LENGTH(TITLE)-LENGTH(REPLACE(TITLE,' ','')) AS num_words,
               SUBSTRING_INDEX(
                   SUBSTRING_INDEX(TITLE, ' ', 
                                   LENGTH(TITLE)-
                                   LENGTH(REPLACE(TITLE,' ',''))+1
                   ), ' ', -1) AS word
        FROM t1
        UNION ALL
        SELECT TITLE,
               num_words - 1 AS num_words,
               SUBSTRING_INDEX(SUBSTRING_INDEX(TITLE, ' ', num_words), ' ', -1)
        FROM cte
        WHERE num_words > 0
    )
    SELECT *
    FROM       cte
    INNER JOIN t2
            ON t2.DESCRIPTION REGEXP CONCAT('( |^)', cte.word, '( |$)')
    

    Check the demo here.

    Assumption: your sentences have no punctuation, and every word is separated by a space character.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search