skip to Main Content

I’m a beginner in the field of artificial intelligence… I can use GATE or any other Natural Language Processing but I don’t have an answer for this :

Do you know how to evaluate how 2 sentences can be close? even with a large data set?

Do you have any recommendations? I can use the number of permutation, the lengh, the number of tokens, metaphone them, etc… but I don’t know what test I should use.

My goal is :
– “Hello Jarvis”
– “Hello Romain, how are you”

- "Hello arvis"
- "Hello Romain, how are you"

- "Hello mister Swift"
- I don't know what you are expecting, is this like "Hello Jarvis" ?
- Yes
- Ok, Hello Romain, How are you?

- "Hello mister swift, how are you?"
- I don't know what are you expecting.

Exemple

By 1, 2, 3 or n is just an example of similarity scale.

Basic

- "Hello IA" is closed to
   - "Hello IA" by 0
   - "Hello AI" by 1 

- "Hello Jarvis" is closed to 
   - "Hello AI" by 2 
   - "Hello IA" by 2

- "Hello! mister Swift" is closed to
   - "Hello AI" by 3
   - "Hello IA" by 3
   - "Hello Jarvis" by 2

Less Basic

- "Hello IA" is (token length, token word, grammatically, syntactically) closed to
   - "Hello IA" by (0,0,0,0)
   - "Hello AI" by (0,1,0,0) 

- "Hello Jarvis" is closed to 
   - "Hello AI" by (0,2,1,1) 
   - "Hello IA" by (0,2,1,1)

- "Hello! mister Swift" is closed to
   - "Hello AI" by (1,2,2,2)
   - "Hello IA" by (1,2,2,2)
   - "Hello Jarvis" by (1,2,2,2)

2

Answers


  1. If you are ready to learn hard-core NLP, you may use a classifier for this task. Have a look for instance at Stanford NLP (Java) or NLTK (Python).

    If you want to keep things simple and use an out-of-the-box solution, have a look at the Wit.ai API it does exactly what you need, and more.

    Login or Signup to reply.
  2. One way to determine string similarity is to use String kernels. There’s a good paper by Lodhi et al explaining how this works:

    http://machinelearning.wustl.edu/mlpapers/paper_files/LodhiSSCW02.pdf

    In order to create a classifier using CoreNLP you would have to create features for the string, such as n-grams, lemmas or similar.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search