skip to Main Content

I’ve been using Microsoft’s Computer Vision OCR to extract text from various types of images – but have seem to hit a bump in the road with Seven Segment font.

OCR doesn't recognize Seven Segmented Text

It sometimes can pick up on them, but it mostly gets it wrong.

Incorrect result from OCR

I’ve looked around and found some alternative methods, but would rather continue using the service we already have. Any suggestions?

2

Answers


  1. Chosen as BEST ANSWER

    After a month of research and experimentation, I'm going to share my findings and solutions here in case anyone else encounters the same or a similar problem.

    The Problem

    I needed a reliable way to extract the temperature from multiple types of Refrigeration Displays. Some of these displays used a standard font that Microsoft's Computer Vision had no trouble with, while others used a Seven-Segmented font.

    Due to the nature of Optical Character Recognition (OCR), Seven-Segmented font is not supported directly. To overcome this, you need to apply some image processing techniques to join the segmented text before passing it into the OCR.

    Solution Overview

    1. Create a Custom Vision Object Detection Model to extract the display from the image.
    2. Develop a Custom Vision Classification Model to determine the type of display.
    3. Depending on the classification, pass the image either to Tesseract along with a model specialized for digital text, or to Computer Vision when dealing with standard text.
    4. Apply regular expressions (Regex) to the output from Tesseract to extract the desired temperature.

    Solution Breakdown

    First, we pass the image into our Object Detection Model.

    Input: Original Image

    Object Detection Output: Object Detection Output

    Then we pass that image into the Classification Model to determine the display type.

    Classification Output: Classification Result

    Next, we perform a series of image processing techniques, including:

    • Gaussian Blur and convert to grayscale: Blur & Graysacle
    • RGB Threshold to pull out the text: RGB Threshold
    • Erosion to connect the segmented text: Erosion
    • Dilation to reduce the amount of extruding pixels: Dilation
    • Document Skew (via AForge.Imaging) & to rotate the image to the orientation of the text: Document Skew

    Since this display is classified as 'Segmented,' it then gets passed into Tesseract and analyzed using the 'LetsGoDigital' model, which is specialized for digital fonts.

    Tesseract Output: "rawText": "- 16.-9,,,6nf"

    After some Regex, we're left with: "value": "-16.96"

    Admittedly, this process isn't providing the best results, but it's sufficient to move forward. By refining the template, input images, Custom Vision Models, and the OCR process, we can expect to see better results in the future.

    It would be amazing to see Seven Segment Font natively supported by Microsoft's Computer Vision, as the current solution feels somewhat hacky. I'd prefer to continue using Computer Vision instead of Tesseract or any other OCR method, considering the nature of our application.


  2. Maybe you need enhance the image or pre-process it so that ocr will detect.

    so, I used below code for enhance the brightness and check for text recognition.

    from  PIL  import  Image, ImageEnhance
    
    def  convert_to_ela_image(path, quality):
        filename  =  path
        resaved_filename  =  'tempresaved.jpg'
        im  =  Image.open(filename).convert('RGB')
        im.save(resaved_filename, 'JPEG', quality  =  quality)
        resaved_im  =  Image.open(resaved_filename)
        ela_im  =  ImageEnhance.Brightness(resaved_im).enhance(0.3)
        ela_im.save("./image/result.jpg",'JPEG')
        return  ela_im
        
    convert_to_ela_image(<image_path>,80)
    

    Here, you need to alter enhance argument in ImageEnhance.Brightness(resaved_im).enhance(0.3) for different image.
    I have given 0.3.
    This gives altered image as below.

    enter image description here

    Predictions.

    pip install azure-ai-vision
    

    code:

    import  os
    import  azure.ai.vision  as  sdk
    
    service_options  =  sdk.VisionServiceOptions("endpoint","key")
    vision_source  =  sdk.VisionSource(filename=r"./image/result.jpg")
    analysis_options  =  sdk.ImageAnalysisOptions()
    analysis_options.features  = (
                sdk.ImageAnalysisFeature.CAPTION |
                sdk.ImageAnalysisFeature.TEXT
    
    )
    analysis_options.language  =  "en"
    analysis_options.gender_neutral_caption  =  True
    image_analyzer  =  sdk.ImageAnalyzer(service_options, vision_source, analysis_options)
    result  =  image_analyzer.analyze()
    
    if  result.reason  ==  sdk.ImageAnalysisResultReason.ANALYZED:
        if  result.caption  is  not  None:
            print(" Caption:")
            print(" '{}', Confidence {:.4f}".format(result.caption.content, result.caption.confidence))
            
        if  result.text  is  not  None:
            print(" Text:")
            
            for  line  in  result.text.lines:
                points_string  =  "{"  +  ", ".join([str(int(point)) for  point  in  line.bounding_polygon]) +  "}"
                print(" Line: '{}', Bounding polygon {}".format(line.content, points_string))
                for  word  in  line.words:
                    points_string  =  "{"  +  ", ".join([str(int(point)) for  point  in  word.bounding_polygon]) +  "}"
                    print(" Word: '{}', Bounding polygon {}, Confidence {:.4f}".format(word.content, points_string, word.confidence))
    else:
        error_details  =  sdk.ImageAnalysisErrorDetails.from_result(result)
        print(" Analysis failed.")
        print(" Error reason: {}".format(error_details.reason))
        print(" Error code: {}".format(error_details.error_code))
        print(" Error message: {}".format(error_details.message))
    

    Output:

    enter image description here

    Using the saved image that is result.jpg in portal.

    enter image description here

    Similarly, you need to alter image on brightness for correct prediction.

    Again, below is the image i am getting wrong output.
    enter image description here

    So, i altered it by giving enhance 0.4 and 0.3

    For 0.4 the output is

    enter image description here

    enter image description here

    For 0.3

    enter image description here

    It gave correct output for 0.4 and for your inputs it 0.3.
    So based on your input data you pre-process the image and select the enhance factor.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search