I’ve been trying to ingest data from an AWS S3 bucket to AWS OpenSearch Serverless using AWS Lambda but I can’t find any documentation on how to do it, any ideas? I’ve seen plenty of AWS OpenSearch Service examples but would they work the same way with Serverless? I’m a bit new to AWS
I’ve looked at the boto3 client
2
Answers
If the data is already present then you would make calls to s3 using boto3 and read the files and then make calls to Opensearch endpoint using its _bulk endpoint.
If files can arrive later then you can have an event fired by s3 and you can have the same code (slightly modified) in lambda function that can push the data to opensearch.
What kind of data is this?
As Prabhat mentions, boto3 is certainly an option, however, the AWS SDK for Pandas (previously AWS Data Wrangler) is a super simple approach. I’ve used it extensively for moving data from CSVs, S3, and other locations into OpenSearch with ease.
Using the AWS SDK for Pandas, you might achieve what you’re looking for like this…
The AWS SDK for Pandas can iterate over chunks of S3 items, and there’s a tutorial on indexing JSON (and other file types) from S3 to OpenSearch.
Of course, you would need to create a Lambda layer to make the package available in your function. Here is a (pretty much) one-click script to do this.