I have this PostgreSQL table for storing words:
CREATE TABLE IF NOT EXISTS words
(
id bigint NOT NULL DEFAULT nextval('processed_words_id_seq'::regclass),
keyword character varying(300) COLLATE pg_catalog."default",
)
insert into words (words)
VALUES ('while swam is interesting');
CREATE TABLE IF NOT EXISTS trademarks
(
id bigint NOT NULL DEFAULT nextval('trademarks_id_seq'::regclass),
trademark character varying(300) COLLATE pg_catalog."default",
)
Into table trademarks
I will have thousands of registered trademarks names.
I want to compare words stored into words
table keyword do they match not only for a words but also for word which is a in a group of words. For example:
I have a keyword while swam is interesting
stored into words.keyword
. I also have a trademark swam
located in trademarks.trademark
I have a word match so I want to detect this using SQL. How I can implement this?
3
Answers
Looks like it’s a task for full text search data types and functions, it would be something like this:
working example here
In docs
a more flexible query, if trademark field contains multiple words
where pipe symbol represents logical "or" and could be replaced by other operand according to your needs.
Example of this here.
Of course indexing will be needed to speed up searches,
check for text search index creation.
Another example, avoiding some special characters (in this case parenthesis) and trailing spaces in tsquery and also considering the posibility of multiple words, as pointed out in comments
you have to take into account that tsquery is a logical sentence and there are some special characters to avoid, check the official docs on
tsquery
PostgreSQL has good support for regular expressions, which should be able to provide a solution. The relevant operators are described here. For example, if the requirement is only to match exact words using case-insensitive matching, the following POSIX regex matcher could be used:
w.keyword ~* CONCAT('([^[:word:]]|^)', t.trademark, '([^[:word:]]|$)')
This is looking at the single character (if there is one) before and after the word being matched (e.g.
swam
) to make sure neither is a word character. So in the examplewhile swam is interesting
both such characters would be spaces so it would be a match.Here is a demo with some further data added to show which lines get matched (
t
= match,f
= no match).Of course, there are other possible requirements such as partial word matching and case sensitive matching – if this answer doesn’t provide a good solution, please let me know in the comments.
To check if a trademark exists in the list of keywords in the words table, you can use the LIKE operator in a SELECT statement. For example:
This will return any keywords that contain the word "swam", including the example keyword "while swam is interesting".
If you want to find exact matches for a trademark in the words table, you can use the = operator instead of LIKE:
This will only return keywords that are exactly "swam", without any additional words before or after it.
You can also use the IN operator to check if a trademark exists in the words table:
This will return any keywords that are exactly "swam".
Note that in all of these examples, the trademarks table is not used. If you want to compare the keywords in the words table with the trademarks in the trademarks table, you can use a JOIN clause in your SELECT statement. For example:
This will return a list of keyword-trademark pairs where the keyword and trademark are the same. You can then use this list to identify any matches between the words and trademarks tables.