skip to Main Content

I have created a table that contains a list of keywords and a code that identifies synonyms. I.e. All keywords with the same code are to be considered the same keyword.

varchar varchar tsvector
C1000 AI 'ai':1
C1000 Artificial intelligence 'artifici':1 'intellig':2
C1001 Algorithms 'algorithm':1
C1002 Software Design 'design':2 'softwar':1
C1003 ui design 'design':2 'ui':1
C1003 User interface design 'design':3 'interfac':2 'user':1
C1003 user interface engineering 'engin':3 'interfac':2 'user':1

I would like to build a query that returns the list of keywords found in a given text.

For example, the following text (is just a sample) should return the array: [C1001,C1003].

A good ui design starts from a good algorithm design, for this you need a good user interface engineering.

Is there a way to do it with a postgres query or a custom function ?

2

Answers


  1. You can use the Naive Bayes classifier algorithm. It is the most powerful algorithm for text classification. Learn more from here

    Login or Signup to reply.
  2. You can convert the text to vector, the keyword to query then check if the vector matches the query

    => d codes 
    Column  |       Type        | Modifiers 
    ---------+-------------------+-----------
    code    | character varying | 
    keyword | character varying | 
    
    => select * from codes ;
     code  |          keyword           
    -------+----------------------------
    C1000 | AI
    C1000 | Artificial intelligence
    C1001 | Algorithms
    C1002 | Software Design
    C1003 | ui design
    C1003 | User interface design
    C1003 | user interface engineering
    (7 rows)
    
    => select distinct code from codes where to_tsvector('A good ui design starts from a good algorithm design, for this you need a good user interface engineering.') @@ plainto_tsquery(keyword);
    code  
    -------
    C1001
    C1003
    (2 rows)
    
    => select array_agg(distinct code) from codes where to_tsvector('A good ui design starts from a good algorithm design, for this you need a good user interface engineering.') @@ plainto_tsquery(keyword);
    array_agg   
    ---------------
    {C1001,C1003}
    (1 row)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search