I am using Postgresql with pgvector for searching similarity between images. The vector should have up to 2000 dimensions, so that Postgres/pgvector can index it.
I am creating a vector of the image with Python and VGG16/VGG19. As a result, I get a vector with 4096 features/dimensions. I need to reduce it to less than 2000 dimensions using Python.
How can I achieve this?
2
Answers
Thanks for your answer. I tried :
st._extract(img) return me vector. I receive error:
I suggest you use Principal Component Analysis (PCA) to reduce the dimensions of your source vectors.
You can adapt this code (here using random dummy data to illustrate):
Be aware that you have to fit the PCA model once on your corpus and then only use the
transform()
function of the fitted model to new, unseen data (e.g. a couple of new embeddings that you might want to ingest into your PostgreSQL data base). Only this way you make sure that the data transformation for new data is equal to that transformed existing data.