based on aws docs provided here , https://docs.aws.amazon.com/code-library/latest/ug/python_3_bedrock-runtime_code_examples.html. In the following example, a model in bedrock is invoked to generate embeddings. instead of making individual calls to bedrock , is there a way to pass in batch of text, say a csv or dataframe and generate the embeddings in batch in aws bedrock?
import boto3
import json
# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-east-1")
# Set the model ID, e.g., Titan Text Embeddings V2.
model_id = "amazon.titan-embed-text-v2:0"
# The text to convert to an embedding.
input_text = "Please recommend books with a theme similar to the movie 'Inception'."
# Create the request for the model.
native_request = {"inputText": input_text}
# Convert the native request to JSON.
request = json.dumps(native_request)
# Invoke the model with the request.
response = client.invoke_model(modelId=model_id, body=request)
# Decode the model's native response body.
model_response = json.loads(response["body"].read())
# Extract and print the generated embedding and the input text token count.
embedding = model_response["embedding"]
input_token_count = model_response["inputTextTokenCount"]
print("nYour input:")
print(input_text)
print(f"Number of input tokens: {input_token_count}")
print(f"Size of the generated embedding: {len(embedding)}")
print("Embedding:")
print(embedding)
2
Answers
Don’t have any idea about the batch processing but you can create lambda where you can read each single record in csv and process. If you want to run in parallel publish records to SQS add it as trigger to lambda it will invoke parallel lambdas
If you don’t need a sync response, you can use batch inference.
https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference-example.html
Using batch inference you will also save 50% in inference cost