Azure - Preforming OCR on Seven Segment Text with Microsoft's Computer Vision?

Wernut
August 2, 2023
131 views
0 votes
2 Answers

I’ve been using Microsoft’s Computer Vision OCR to extract text from various types of images – but have seem to hit a bump in the road with Seven Segment font.

It sometimes can pick up on them, but it mostly gets it wrong.

I’ve looked around and found some alternative methods, but would rather continue using the service we already have. Any suggestions?

Tags: azure ocr

Answers

Chosen as BEST ANSWER
- Wernut
- August 2, 2023 at 8:35 am
- 0 votes
0
After a month of research and experimentation, I'm going to share my findings and solutions here in case anyone else encounters the same or a similar problem.

The Problem

I needed a reliable way to extract the temperature from multiple types of Refrigeration Displays. Some of these displays used a standard font that Microsoft's Computer Vision had no trouble with, while others used a Seven-Segmented font.

Due to the nature of Optical Character Recognition (OCR), Seven-Segmented font is not supported directly. To overcome this, you need to apply some image processing techniques to join the segmented text before passing it into the OCR.

Solution Overview
1. Create a Custom Vision Object Detection Model to extract the display from the image.
2. Develop a Custom Vision Classification Model to determine the type of display.
3. Depending on the classification, pass the image either to Tesseract along with a model specialized for digital text, or to Computer Vision when dealing with standard text.
4. Apply regular expressions (Regex) to the output from Tesseract to extract the desired temperature.
Solution Breakdown

First, we pass the image into our Object Detection Model.

Input: Original Image

Object Detection Output: Object Detection Output

Then we pass that image into the Classification Model to determine the display type.

Classification Output: Classification Result

Next, we perform a series of image processing techniques, including:
- Gaussian Blur and convert to grayscale: Blur & Graysacle
- RGB Threshold to pull out the text: RGB Threshold
- Erosion to connect the segmented text: Erosion
- Dilation to reduce the amount of extruding pixels: Dilation
- Document Skew (via AForge.Imaging) & to rotate the image to the orientation of the text: Document Skew
Since this display is classified as 'Segmented,' it then gets passed into Tesseract and analyzed using the 'LetsGoDigital' model, which is specialized for digital fonts.

Tesseract Output: "rawText": "- 16.-9,,,6nf"

After some Regex, we're left with: "value": "-16.96"

Admittedly, this process isn't providing the best results, but it's sufficient to move forward. By refining the template, input images, Custom Vision Models, and the OCR process, we can expect to see better results in the future.

It would be amazing to see Seven Segment Font natively supported by Microsoft's Computer Vision, as the current solution feels somewhat hacky. I'd prefer to continue using Computer Vision instead of Tesseract or any other OCR method, considering the nature of our application.

(Edit)

Maybe you need enhance the image or pre-process it so that ocr will detect.

so, I used below code for enhance the brightness and check for text recognition.

from  PIL  import  Image, ImageEnhance

def  convert_to_ela_image(path, quality):
    filename  =  path
    resaved_filename  =  'tempresaved.jpg'
    im  =  Image.open(filename).convert('RGB')
    im.save(resaved_filename, 'JPEG', quality  =  quality)
    resaved_im  =  Image.open(resaved_filename)
    ela_im  =  ImageEnhance.Brightness(resaved_im).enhance(0.3)
    ela_im.save("./image/result.jpg",'JPEG')
    return  ela_im
    
convert_to_ela_image(<image_path>,80)

Here, you need to alter enhance argument in ImageEnhance.Brightness(resaved_im).enhance(0.3) for different image.
I have given 0.3.
This gives altered image as below.

Predictions.

pip install azure-ai-vision

code:

import  os
import  azure.ai.vision  as  sdk

service_options  =  sdk.VisionServiceOptions("endpoint","key")
vision_source  =  sdk.VisionSource(filename=r"./image/result.jpg")
analysis_options  =  sdk.ImageAnalysisOptions()
analysis_options.features  = (
            sdk.ImageAnalysisFeature.CAPTION |
            sdk.ImageAnalysisFeature.TEXT

)
analysis_options.language  =  "en"
analysis_options.gender_neutral_caption  =  True
image_analyzer  =  sdk.ImageAnalyzer(service_options, vision_source, analysis_options)
result  =  image_analyzer.analyze()

if  result.reason  ==  sdk.ImageAnalysisResultReason.ANALYZED:
    if  result.caption  is  not  None:
        print(" Caption:")
        print(" '{}', Confidence {:.4f}".format(result.caption.content, result.caption.confidence))
        
    if  result.text  is  not  None:
        print(" Text:")
        
        for  line  in  result.text.lines:
            points_string  =  "{"  +  ", ".join([str(int(point)) for  point  in  line.bounding_polygon]) +  "}"
            print(" Line: '{}', Bounding polygon {}".format(line.content, points_string))
            for  word  in  line.words:
                points_string  =  "{"  +  ", ".join([str(int(point)) for  point  in  word.bounding_polygon]) +  "}"
                print(" Word: '{}', Bounding polygon {}, Confidence {:.4f}".format(word.content, points_string, word.confidence))
else:
    error_details  =  sdk.ImageAnalysisErrorDetails.from_result(result)
    print(" Analysis failed.")
    print(" Error reason: {}".format(error_details.reason))
    print(" Error code: {}".format(error_details.error_code))
    print(" Error message: {}".format(error_details.message))

Output: