skip to Main Content

I’m new to programming.
I met a problem when loading DOCX, XLSX, and PPTX with AzureAIDocumentIntelligenceLoade
I have created Document Intelligence successfully and got an "endpoint" and "key".
However, when I try to load an Xlsx document through langchain_community in the Python environment, my computer reports as following:

azure.core.exceptions.ResourceNotFoundError: (404) Resource not found Code: 404 Message: Resource not found

Here is my code snippet:

`file = "D:/Python/xuexisucai/LLM/Coffe list.xlsx"
endpoint = "https://xxxxx.cognitiveservices.azure.com/"
key = "123456"
loader = AzureAIDocumentIntelligenceLoader(
api_endpoint= endpoint, api_key= key, file_path=file, api_model= "prebuilt-layout"
)
documents = loader.load()`

Which expert can come and help this newbie?

2

Answers


    • The error ResourceNotFoundError occurs due to invalid details Endpoint , Key and Document Path.

    • According to Document the Document Intelligence layout model offers support for analyzing XLSX files, ensuring accurate extraction of text, tables, figures, and hierarchical structures.

    • XLSX documents can be processed effectively with this model, requiring clear photos or high-quality scans for optimal results.
      enter image description here

    • Development options include utilizing Document Intelligence Studio, REST API, and SDKs.

    • Output components include paragraphs, where text blocks like titles, section headings, and footnotes are extracted, alongside tables, which are parsed to determine column and row information.

    enter image description here

    • Figures such as charts and images are detected, and sections within the document are identified, aiding in organization and comprehension. Additionally, the model can output extracted text in markdown format, facilitating integration with markdown-based workflows.
    
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.documentintelligence import DocumentIntelligenceClient
    from azure.ai.documentintelligence.models import AnalyzeResult
    
    endpoint = "DOCUMENTINTELLIGENCE_ENDPOINT"
    key = "DOCUMENTINTELLIGENCE_API_KEY"
    
    document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
    
    # Path to your XLSX file
    path_to_xlsx_file = "C://Users//file_example_XLSX_101.xlsx"
    
    # Open the XLSX file as a binary read mode
    with open(path_to_xlsx_file, "rb") as f:
        poller = document_intelligence_client.begin_analyze_document(
            "prebuilt-layout", analyze_request=f, content_type="application/octet-stream"
        )
        result: AnalyzeResult = poller.result()
    
        # Process the analysis result
        for page in result.pages:
            print(f"----Analyzing layout from page #{page.page_number}----")
            print(f"Page has Pageitems: {page.items} and page_number: {page.page_number}, ")
    
        
    

    Output:
    enter image description here

    Login or Signup to reply.
  1. Its the same issue even I have!

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search