skip to Main Content

I have this PostgreSQL table for storing words:

CREATE TABLE IF NOT EXISTS words
(
    id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
    keyword character varying(300) COLLATE pg_catalog."default",
)

insert into words (words)
VALUES ('while swam is interesting');

CREATE TABLE IF NOT EXISTS trademarks
(
   id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
   trademark character varying(300) COLLATE pg_catalog."default",
)

Into table trademarks I will have thousands of registered trademarks names.
I want to compare words stored into words table keyword do they match not only for a words but also for word which is a in a group of words. For example:

I have a keyword while swam is interesting stored into words.keyword. I also have a trademark swam located in trademarks.trademark I have a word match so I want to detect this using SQL. How I can implement this?

3

Answers


  1. Looks like it’s a task for full text search data types and functions, it would be something like this:

    select w.id, w.keyword, t.trademark 
    from words w
    inner join trademarks t on w.keyword::tsvector @@ t.trademark::tsquery;
    

    working example here

    In docs

    a more flexible query, if trademark field contains multiple words

    select w.id, w.keyword, t.trademark 
    from words w
    inner join trademarks t on w.keyword::tsvector @@ 
    regexp_replace(t.trademark, 's', ' | ', 'g' )::tsquery;
    

    where pipe symbol represents logical "or" and could be replaced by other operand according to your needs.
    Example of this here.
    Of course indexing will be needed to speed up searches,
    check for text search index creation.

    Another example, avoiding some special characters (in this case parenthesis) and trailing spaces in tsquery and also considering the posibility of multiple words, as pointed out in comments

    select w.id, w.keyword, t.trademark 
    from words w
    inner join trademarks t on regexp_replace(w.keyword,'(|)', ' ', 'g' )::tsvector 
    @@ regexp_replace(regexp_replace(trim(t.trademark),'(|)', ' ', 'g' )
    , 's+?(?=[a-zA-z])', ' | ', 'g' )::tsquery;    
    

    you have to take into account that tsquery is a logical sentence and there are some special characters to avoid, check the official docs on
    tsquery

    Login or Signup to reply.
  2. PostgreSQL has good support for regular expressions, which should be able to provide a solution. The relevant operators are described here. For example, if the requirement is only to match exact words using case-insensitive matching, the following POSIX regex matcher could be used:

    w.keyword ~* CONCAT('([^[:word:]]|^)', t.trademark, '([^[:word:]]|$)')

    This is looking at the single character (if there is one) before and after the word being matched (e.g. swam) to make sure neither is a word character. So in the example while swam is interesting both such characters would be spaces so it would be a match.

    Here is a demo with some further data added to show which lines get matched (t = match, f = no match).

    Of course, there are other possible requirements such as partial word matching and case sensitive matching – if this answer doesn’t provide a good solution, please let me know in the comments.

    Login or Signup to reply.
  3. To check if a trademark exists in the list of keywords in the words table, you can use the LIKE operator in a SELECT statement. For example:

    SELECT keyword
    FROM words
    WHERE keyword LIKE '%swam%';
    

    This will return any keywords that contain the word "swam", including the example keyword "while swam is interesting".

    If you want to find exact matches for a trademark in the words table, you can use the = operator instead of LIKE:

    SELECT keyword
    FROM words
    WHERE keyword = 'swam';
    

    This will only return keywords that are exactly "swam", without any additional words before or after it.

    You can also use the IN operator to check if a trademark exists in the words table:

    SELECT keyword
    FROM words
    WHERE keyword IN ('swam');
    

    This will return any keywords that are exactly "swam".

    Note that in all of these examples, the trademarks table is not used. If you want to compare the keywords in the words table with the trademarks in the trademarks table, you can use a JOIN clause in your SELECT statement. For example:

    SELECT w.keyword, t.trademark
    FROM words w
    JOIN trademarks t ON w.keyword = t.trademark;
    

    This will return a list of keyword-trademark pairs where the keyword and trademark are the same. You can then use this list to identify any matches between the words and trademarks tables.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search