skip to Main Content

I’ve been trying to ingest data from an AWS S3 bucket to AWS OpenSearch Serverless using AWS Lambda but I can’t find any documentation on how to do it, any ideas? I’ve seen plenty of AWS OpenSearch Service examples but would they work the same way with Serverless? I’m a bit new to AWS

I’ve looked at the boto3 client

2

Answers


  1. If the data is already present then you would make calls to s3 using boto3 and read the files and then make calls to Opensearch endpoint using its _bulk endpoint.

    If files can arrive later then you can have an event fired by s3 and you can have the same code (slightly modified) in lambda function that can push the data to opensearch.

    What kind of data is this?

    Login or Signup to reply.
  2. As Prabhat mentions, boto3 is certainly an option, however, the AWS SDK for Pandas (previously AWS Data Wrangler) is a super simple approach. I’ve used it extensively for moving data from CSVs, S3, and other locations into OpenSearch with ease.

    Using the AWS SDK for Pandas, you might achieve what you’re looking for like this…

    import awswrangler as wr
    from opensearchpy import OpenSearch
    
    items = wr.s3.read_json(path="s3://my-bucket/my-folder/")
    
    # connect + upload to OpenSearch
    my_client = OpenSearch(...)
    wr.opensearch.index_df(client=my_client, df=items)
    

    The AWS SDK for Pandas can iterate over chunks of S3 items, and there’s a tutorial on indexing JSON (and other file types) from S3 to OpenSearch.

    Of course, you would need to create a Lambda layer to make the package available in your function. Here is a (pretty much) one-click script to do this.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search