skip to Main Content

I’m running the azure-search-openai-demo, and I’ve successfully implemented searching across all files within a container in Azure Blob Storage. However, I want to limit the search to a specific file rather than searching through all the files in the container.

Current Setup:

I’m using the demo’s search capability, which is currently configured to search across all files in the container.
I want to modify the code to search within only one specific file.
Question:
How can I adjust the existing demo code to search within a single specified file instead of all files in the container? What modifications are needed to target just one file?

Any guidance or examples would be greatly appreciated!

@authenticated
async def chat_stream(auth_claims: Dict[str, Any]):
    if not request.is_json:
        return jsonify({"error": "request must be json"}), 415
    request_json = await request.get_json()
    context = request_json.get("context", {})
    context["auth_claims"] = auth_claims
    try:
        use_gpt4v = context.get("overrides", {}).get("use_gpt4v", False)
        approach: Approach
        if use_gpt4v and CONFIG_CHAT_VISION_APPROACH in current_app.config:
            approach = cast(Approach, current_app.config[CONFIG_CHAT_VISION_APPROACH])
        else:
            approach = cast(Approach, current_app.config[CONFIG_CHAT_APPROACH])

        result = await approach.run_stream(
            request_json["messages"],
            context=context,
            session_state=request_json.get("session_state"),
        )
        response = await make_response(format_as_ndjson(result))
        response.timeout = None  # type: ignore
        response.mimetype = "application/json-lines"
        return response
    except Exception as error:
        return error_response(error, "/chat")```

2

Answers


  1. To modify the azure-search-openai-demo to search within a specific file instead of searching across all files in the Azure Blob Storage container, you need to adjust the code that handles the search query.

    I followed this document for Azure AI Search with OpenAI.

    Modify the chat_stream function to include filtering by a specific file:

    
    @authenticated
    async def chat_stream(auth_claims: Dict[str, Any]):
        if not request.is_json:
            return jsonify({"error": "request must be json"}), 415
        
        request_json = await request.get_json()
        context = request_json.get("context", {})
        context["auth_claims"] = auth_claims
    
        try:
            use_gpt4v = context.get("overrides", {}).get("use_gpt4v", False)
            approach: Approach
            
            if use_gpt4v and CONFIG_CHAT_VISION_APPROACH in current_app.config:
                approach = cast(Approach, current_app.config[CONFIG_CHAT_VISION_APPROACH])
            else:
                approach = cast(Approach, current_app.config[CONFIG_CHAT_APPROACH])
    
            # Extract the specific file name from the context
            specific_file_name = context.get("specificFileName")
    
            # Add a filter to the search query to limit to the specific file
            search_filter = f"fileName eq '{specific_file_name}'" if specific_file_name else None
    
            result = await approach.run_stream(
                request_json["messages"],
                context=context,
                session_state=request_json.get("session_state"),
                search_filter=search_filter  # Pass the filter to the search approach
            )
            
            response = await make_response(format_as_ndjson(result))
            response.timeout = None  # type: ignore
            response.mimetype = "application/json-lines"
            return response
    
        except Exception as error:
            return error_response(error, "/chat")
    
    
    
    • I used this document to use Azure OpenAI Assistants for file search, and this document for the Azure Files indexer in Azure AI Search.
    • To search within a single file, you can create a query that includes a filter based on a unique identifier or a field that represents the file you want to search within. For instance, if you have a field called “documentId,” you can include a filter in your query to search within a specific file by its ID.

    If your Approach class’s run_stream method doesn’t support a search_filter parameter, you’ll need to modify it to include this functionality:

    async def run_stream(self, messages: List[Dict[str, Any]], context: Dict[str, Any], session_state: Dict[str, Any], search_filter: Optional[str] = None):
        # Use the search_filter in your Azure Cognitive Search query
        search_query = {
            "search": " ".join([msg["content"] for msg in messages]),
            "filter": search_filter,
            "top": 5
        }
        
        # Perform the search with the provided filter
        search_results = await self.search_client.search(search_query)
    
        # Process the search results...
        ...
    

    Output:

    enter image description here

    Login or Signup to reply.
  2. Put the name of the blob/file you are trying to filter in the filter build_filter inside the approach.py.

    In my case im sending the blob in the overrides call from the frontend to the /chat/steam endpoint in the app.py.

    It is then possible to use in the build_filter like so:

    def build_filter(self, overrides: dict[str, Any], auth_claims: dict[str, Any]) -> Optional[str]:
    exclude_category = overrides.get("exclude_category")
    selected_blob = overrides.get("selected_blob")
    security_filter = self.auth_helper.build_security_filters(overrides, auth_claims)
    filters = []
    
    if exclude_category:
        filters.append("category ne '{}'".format(exclude_category.replace("'", "''")))
    if selected_blob:
        filters.append(f"sourcefile eq '{selected_blob}'")
    if security_filter:
        filters.append(security_filter)
        
    return None if len(filters) == 0 else " and ".join(filters)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search