skip to Main Content

I am using Postgresql with pgvector for searching similarity between images. The vector should have up to 2000 dimensions, so that Postgres/pgvector can index it.

I am creating a vector of the image with Python and VGG16/VGG19. As a result, I get a vector with 4096 features/dimensions. I need to reduce it to less than 2000 dimensions using Python.

How can I achieve this?

2

Answers


  1. Chosen as BEST ANSWER

    Thanks for your answer. I tried :

    pca = PCA(n_components=2000)
    vectors_pca = pca.fit_transform(np.array(st._extract(img)))
    print(vectors_pca)
    

    st._extract(img) return me vector. I receive error:

    "Expected 2D array, got 1D array instead: array=[2.92967334e-02 0.00000000e+00 0.00000000e+00 1.52434886e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 6.58618007e-03 1.74059682e-02 4.07547764e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.00543124e-02 0.00000000e+00 4.21085954e-02 2.77808681e-02 0.00000000e+00 0.00000000e+00 3.13697034e-03 0.00000000e+00 6.93593407e-03 0.00000000e+00 0.00000000e+00 . . . ]"


  2. I suggest you use Principal Component Analysis (PCA) to reduce the dimensions of your source vectors.

    You can adapt this code (here using random dummy data to illustrate):

    from sklearn.decomposition import PCA
    import numpy as np
    
    vectors_4096 = np.array([np.random.normal(size=4096) for x in range(5000)])
    display(vectors_4096.shape)
    >>> (5000, 4096)
    
    # instantiate PCA 
    # for n_components choose number of dims that you want to reduce to
    pca = PCA(n_components=2000)
    
    # fit PCA model and transform data:
    vectors_pca = pca.fit_transform(vectors_4096)
    display(vectors_pca.shape)
    >>> (5000, 2000)
    

    Be aware that you have to fit the PCA model once on your corpus and then only use the transform() function of the fitted model to new, unseen data (e.g. a couple of new embeddings that you might want to ingest into your PostgreSQL data base). Only this way you make sure that the data transformation for new data is equal to that transformed existing data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search