I’m a beginner in the field of artificial intelligence… I can use GATE or any other Natural Language Processing but I don’t have an answer for this :
Do you know how to evaluate how 2 sentences can be close? even with a large data set?
Do you have any recommendations? I can use the number of permutation, the lengh, the number of tokens, metaphone them, etc… but I don’t know what test I should use.
My goal is :
– “Hello Jarvis”
– “Hello Romain, how are you”
- "Hello arvis"
- "Hello Romain, how are you"
- "Hello mister Swift"
- I don't know what you are expecting, is this like "Hello Jarvis" ?
- Yes
- Ok, Hello Romain, How are you?
- "Hello mister swift, how are you?"
- I don't know what are you expecting.
Exemple
By 1, 2, 3 or n is just an example of similarity scale.
Basic
- "Hello IA" is closed to
- "Hello IA" by 0
- "Hello AI" by 1
- "Hello Jarvis" is closed to
- "Hello AI" by 2
- "Hello IA" by 2
- "Hello! mister Swift" is closed to
- "Hello AI" by 3
- "Hello IA" by 3
- "Hello Jarvis" by 2
Less Basic
- "Hello IA" is (token length, token word, grammatically, syntactically) closed to
- "Hello IA" by (0,0,0,0)
- "Hello AI" by (0,1,0,0)
- "Hello Jarvis" is closed to
- "Hello AI" by (0,2,1,1)
- "Hello IA" by (0,2,1,1)
- "Hello! mister Swift" is closed to
- "Hello AI" by (1,2,2,2)
- "Hello IA" by (1,2,2,2)
- "Hello Jarvis" by (1,2,2,2)
2
Answers
If you are ready to learn hard-core NLP, you may use a classifier for this task. Have a look for instance at Stanford NLP (Java) or NLTK (Python).
If you want to keep things simple and use an out-of-the-box solution, have a look at the Wit.ai API it does exactly what you need, and more.
One way to determine string similarity is to use String kernels. There’s a good paper by Lodhi et al explaining how this works:
http://machinelearning.wustl.edu/mlpapers/paper_files/LodhiSSCW02.pdf
In order to create a classifier using CoreNLP you would have to create features for the string, such as n-grams, lemmas or similar.