I should start by saing: Im not good at programing nonetheless it is extremly fun!
I’m working on a Siri like program and I’m trying to implement a Wikipedia function. To do this I ask a question, for example: tell me about superman
I need to extract the word superman or any other random word that someone might ask from the string. This is not that hard, but the real problems start when someone asks: could you tell me about superman I still want to extract the word superman.
this is an example of what I have tried before:
if ((c.Contains("tell me about")) || (c.Contains("Tell me about")))
{
string query = c;
var part = query.Split('t').Last(); //cant search for words containing the letter t like artificial intelligence
string url = ("http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryString=" + part + "&MaxHits=1");
XmlReader reader = XmlReader.Create(url);
while (reader.Read())
switch (reader.Name.ToString())
{
case "Description":
sp(reader.ReadString());
break;
}
}
I was almost able to solve the problem, seems like this solution works about 80% of the time. However it is a step in the right direction.
if ((c.Contains("tell me about")) || (c.Contains("Tell me about")))
{
string query = c;
string[] lines = Regex.Split(query, "about ");
foreach (string line in lines)
{
string url = ("http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryString=" + line + "&MaxHits=1");
XmlReader reader = XmlReader.Create(url);
while (reader.Read())
switch (reader.Name.ToString())
{
case "Description":
sp(reader.ReadString());
break;
}
}
Is there a better/easier way to do this?
2
Answers
I finally found a answer:
it now works 100% of the time! if someone knows a better way to do this I would be more than happy too hear.
As suggested in the comments if it’s for any kind of production application best option is to use some existing library.
Still it can be a fun exercise to do it on your own.
I would say there is many more ways to ask about Superman.
And many many more.
All the questions are build from some auxiliary words: “what”, “who”, “a”, “about”, and the actual word describing the subject of the question: “Superman”.
The simplified approach would be to eliminate all the auxiliaries and take whatever remains.
To quickly build simple list of question words and question phrases I used English grammar site. I took the phrases, and removed the subject of question. This gave me a list of 50-60 auxiliary words for my list.
Now all I do is to take the sentence and remove all the words that are in the auxiliary list. The code is below:
It’s quite simplistic but with no effort it narrows the list of potential subject of the question.