I need to create a Vector database in AWS. I was using Pinecone in my POC, but for safety reasons the company need something inside AWS. I saw some people recommending to use OpenSearch, but I read in a blog that OpenSearch don’t really do Vector Search
Documented in
https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
the approach to vector search has exactly the same limitation as what
we observed with Solr: it will retrieve all documents that match the
search criteria (keyword query along with filters on document
attributes), and score all of them with the vector similarity of
choice (cosine distance, dot-product or L1/L2 norms). That is, vector
similarity will not be used during retrieval (first and expensive
step): it will instead be used during document scoring (second step).
Therefore, since you can’t know in advance, how many documents to
fetch to surface most semantically relevant, the mathematical idea of
vector search is not really applied.
Does any one know any alternative, or is OpenSearch the best we can do in AWS? I read some people talking about using DynamoDB too, but I didn’t fully understand how it work. If any one have any ideas or suggestions I would really appreciate.
Souce: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455
2
Answers
Amazon OpenSearch has a vector based search plugin called as
kNN
and has experimental features to allow users to perform semantic search.Reference: K-NN
AWS K-NN
Semantic Search feature
try some newer ones like qdrant, weaviate, milvus . A lot easier to use and less resource hungry than opensearch.