I want to quickly scan a chat conversation – in close to real time – and detect any food names that are in it. I’ve got a database of 400K foods or so, so the trivial solution of just using a regex in memory won’t scale.
I’ve got a database(Postgres), and I’ve got a programming language (rails).
Any ideas?
2
Answers
Make the 400k words a string, with each word separated by some whitespace, then create the regex from the chat string, and match it to the list string using word anchors.
Consider using PostgreSQL’s Full-Text Search, preprocess the text, batch process if real-time isn’t essential, use caching, define confidence criteria, and optimize using parallel processing to effectively locate food names in a chat conversation and scale with a database of 400K foods. Optionally combine machine learning and natural language processing to increase accuracy.
Hope it works 🙂