I’ve been using Microsoft’s Computer Vision OCR to extract text from various types of images – but have seem to hit a bump in the road with Seven Segment font.
It sometimes can pick up on them, but it mostly gets it wrong.
I’ve looked around and found some alternative methods, but would rather continue using the service we already have. Any suggestions?
2
Answers
After a month of research and experimentation, I'm going to share my findings and solutions here in case anyone else encounters the same or a similar problem.
The Problem
I needed a reliable way to extract the temperature from multiple types of Refrigeration Displays. Some of these displays used a standard font that Microsoft's Computer Vision had no trouble with, while others used a Seven-Segmented font.
Due to the nature of Optical Character Recognition (OCR), Seven-Segmented font is not supported directly. To overcome this, you need to apply some image processing techniques to join the segmented text before passing it into the OCR.
Solution Overview
Solution Breakdown
First, we pass the image into our Object Detection Model.
Input: Original Image
Object Detection Output: Object Detection Output
Then we pass that image into the Classification Model to determine the display type.
Classification Output: Classification Result
Next, we perform a series of image processing techniques, including:
Since this display is classified as 'Segmented,' it then gets passed into Tesseract and analyzed using the 'LetsGoDigital' model, which is specialized for digital fonts.
Tesseract Output:
"rawText": "- 16.-9,,,6nf"
After some Regex, we're left with:
"value": "-16.96"
Admittedly, this process isn't providing the best results, but it's sufficient to move forward. By refining the template, input images, Custom Vision Models, and the OCR process, we can expect to see better results in the future.
It would be amazing to see Seven Segment Font natively supported by Microsoft's Computer Vision, as the current solution feels somewhat hacky. I'd prefer to continue using Computer Vision instead of Tesseract or any other OCR method, considering the nature of our application.
Maybe you need enhance the image or pre-process it so that ocr will detect.
so, I used below code for enhance the brightness and check for text recognition.
Here, you need to alter enhance argument in
ImageEnhance.Brightness(resaved_im).enhance(0.3)
for different image.I have given 0.3.
This gives altered image as below.
Predictions.
code:
Output:
Using the saved image that is result.jpg in portal.
Similarly, you need to alter image on brightness for correct prediction.
Again, below is the image i am getting wrong output.
So, i altered it by giving enhance 0.4 and 0.3
For 0.4 the output is
For 0.3
It gave correct output for 0.4 and for your inputs it 0.3.
So based on your input data you pre-process the image and select the enhance factor.