skip to Main Content

We have an Azure SQL table with a column called Description (varchar(1000)). While we loaded data into this table via ADF Pipeline, we used the REST API to create vector embeddings on Description. We used text-embedding-ada-002 model which creates 1536 vectors. These embeddings were then saved as a comma delimited values in another column called VectorValues (varchar(max)).

We are using Azure AI Search Indexer to pull data into an index.

{"name": "Description","type": "Edm.String", "searchable": "true", "retrievable": "true", "sortable": "false", "filterable": "false", "facetable": "false"},
{
  "name": "VectorValues",
  "type": "Collection(Edm.Single)",
  "dimensions": 1536,
  "vectorSearchProfile": "myHnswProfile",
  "searchable": "true",
  "retrievable": "true",
  "filterable": "false",
  "sortable": "false",
  "facetable": "false"
}

How can we convert values of type string/varchar into Collection(Edm.Single), using the Indexer?

Apparently, while creating Indexer we can provide fieldMappings with Mapping Functions. However, we can only convert from string to string.

"fieldMappings" : [
  {
    "sourceFieldName" : "VectorValues", 
    "targetFieldName" : "VectorValues",
    "mappingFunction" : { "name" : "jsonArrayToStringCollection" }
  }]

SQL table showing 2 columns

2

Answers


  1. Chosen as BEST ANSWER

    Thanks Ryan I did the same thing and it is now working. Let me explain. While loading data via the pipeline, in addition to putting data as a comma delimited value against the description as VectorValues nvarchar(max). I also added them as float in another table, against the Description Id (PK). I then created a view on top to combine these two tables on PK = FK and while picking up, I converted those child records (vector values) into json array.

    This seem to have resolved the issue. and data is loading in the index.

    SELECT 
         r.GIID
        ,r.ReportName
        ,r.[Description]
        ,JSON_QUERY('[' + STUFF((
                    SELECT CONCAT(',', v.VectorValue,'')
                    FROM dbo.ReportVectorData as v
                    WHERE v.ReportGIID = r.GIID
                    FOR XML PATH('')),1,1,'') + ']') AS VectorValue
    FROM dbo.vwSearchReportsWithSecurity as r 
    

  2. Azure SQL and Cognitive Search have pretty limited transformation capabilities in the Indexer so you’d typically handle this outside the Indexer.

    To convert VectorValues from a comma delimited string to a Collection(Edm.Single) in Azure Search you’ll need to preprocess the data before it reaches the Indexer because the Indexer itself doesn’t support direct conversion from a comma-separated string to a Collection(Edm.Single).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search