I’m just learning learning python and don’t know many things. At the moment I’m building a telegram bot that can help you to find appropriate text to read in foreign lang.
The core function I want to implement is:
The bot suggest you the text to read based on you your vocabulary. (when you mark a text as "read", all the words are added to your dictionary. like that bot collects info)
For example you are user A, you know 500 words, and you want to get the text from the bot database where you know at least 75% of words or at least 90% of words.
Right now I have the database of user words and texts. How should I approach indexing, that whould tell me how many words user know from each text?
Obviously, I can compare the list of user words with the list of words from each text at every bot start. But I’m not sure if it is the most efficient way. Each time indexing 100+ texts feels like a strange idea.
Could you please suggest me where can I read about similar problems? Or how can i search it? I don’t even know how to google it…
2
Answers
You don’t need to process every text at every bot start.
Process every text once.
Then write the results of all the processing to a file. When the bot starts, recover the data by reading that data file.
You can use database like Elasticsearch that allows you to do full-text search. And when you’ll query with user words, it will also give you confidence value, with which you can decide which text has better matching.