We have an Azure SQL table with a column called Description (varchar(1000)
). While we loaded data into this table via ADF Pipeline, we used the REST API to create vector embeddings on Description. We used text-embedding-ada-002 model which creates 1536 vectors. These embeddings were then saved as a comma delimited values in another column called VectorValues (varchar(max)
).
We are using Azure AI Search Indexer to pull data into an index.
{"name": "Description","type": "Edm.String", "searchable": "true", "retrievable": "true", "sortable": "false", "filterable": "false", "facetable": "false"},
{
"name": "VectorValues",
"type": "Collection(Edm.Single)",
"dimensions": 1536,
"vectorSearchProfile": "myHnswProfile",
"searchable": "true",
"retrievable": "true",
"filterable": "false",
"sortable": "false",
"facetable": "false"
}
How can we convert values of type string/varchar into Collection(Edm.Single), using the Indexer?
Apparently, while creating Indexer we can provide fieldMappings with Mapping Functions. However, we can only convert from string to string.
"fieldMappings" : [
{
"sourceFieldName" : "VectorValues",
"targetFieldName" : "VectorValues",
"mappingFunction" : { "name" : "jsonArrayToStringCollection" }
}]
2
Answers
Thanks Ryan I did the same thing and it is now working. Let me explain. While loading data via the pipeline, in addition to putting data as a comma delimited value against the description as VectorValues nvarchar(max). I also added them as float in another table, against the Description Id (PK). I then created a view on top to combine these two tables on PK = FK and while picking up, I converted those child records (vector values) into json array.
This seem to have resolved the issue. and data is loading in the index.
Azure SQL and Cognitive Search have pretty limited transformation capabilities in the Indexer so you’d typically handle this outside the Indexer.
To convert VectorValues from a comma delimited string to a
Collection(Edm.Single)
in Azure Search you’ll need to preprocess the data before it reaches the Indexer because the Indexer itself doesn’t support direct conversion from a comma-separated string to aCollection(Edm.Single)
.