skip to Main Content

I’m trying to use AWS Bedrock’s RAG feature. I created an s3 bucket and put in some CSV files representing tables from a customer database.

I created two knowledge bases – one’s data source is the whole bucket and the other is one of the files. I’m not sure if putting a whole bucket as a data source will work.

While testing the knowledge bases on the console, I get an error that says the data source is not synced:

data source not synced.

When I try to sync the sources, I get no feedback; the sync status does not change and there is no popup for an error or an ongoing operation:
no sync feedback

Extra question: I want to use my app’s customer database knowledge to help the FM exploit that data and know better the customer that’s giving prompts. the data is structured (sql) but not textual at all, very few attributes are while the others are mostly foreign keys..etc so lots of relationships to understand.

I have doubts that the LLM can’t get use of that as I only know the use cases of big blocks of text such us policies. can anyone confirm if I shouldn’t be using RAG here? and give me possible alternative solutions if so. OR should I just preprocess the data before ingesting it with bedrock?
PS i can’t go with fine-tuning for it expenses.

2

Answers


    1. You can absolutely use an S3 bucket as the data source and have it contain multiple files (I think the limit is 2.5 million files – see: Quotas for Amazon Bedrock)
    2. CSVs can be a little tricky they need to be well formed. I’ve had a few fail on me too and it has always come down to a malformed CSV. The Bedrock console provides you with a little info if you go to the knowledge bases / select the knowledge base your interested in / sync history / View warnings. To get more detail you need to look into the Cloudwatch Logs. Have a look at Knowledge bases logging if you need some help with this.

    Extra question: First, you would need to extract the data out of SQL into something the LLM can access – but putting that aside. Without understanding the data that you are considering it’s hard to give an answer but, if your talking about tables of raw data without context then it’s unlikely to be much use. Hopefully, hopefully all that nice data is used for something that humans can use (otherwise what’s the point of the data?) so I’d focus on that. If the data is used to develop reporting (e.g. a stock report or financial statement in pdf) then try providing that to your model.

    When you’re starting your journey I suggest following the guidance from Anthropic (which is actually about prompt engineering on Claude but useful generally):

    When interacting with Claude, think of it as a brilliant but very new employee (with amnesia) who needs explicit instructions. Like any new employee, Claude does not have context on your norms, styles, guidelines, or preferred ways of working. The more precisely you explain what you want, the better Claude’s response will be.

    The golden rule of clear prompting:
    Show your prompt to a colleague, ideally someone who has minimal context on the task, and ask them to follow the instructions. If they’re confused, Claude will likely be too.

    Login or Signup to reply.
  1. Did you ever figure this out? I’m running into the same issue and it is driving me crazy.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search