I installed tesseract5 on WSL (Ubuntu 22.04.1LTS) and tried to detect numbers from images as follows, but Tesseract returned wrong answers. How can I get right answers?
My environment:
- Windows 11 22H2
- WSL2 Ubuntu 22.04.1LTS
- tesseract 5.3.1-20-g58b7
I tried Tesseract like this
tesseract hoge.jpg output -l eng
and output.txt is
Fb¥
&/0
Here is hoge.jpg
.
Thank you for helping in advance. I’m a Japanese student, so my English may be not so good. So if you think it’s not good English, please change this post to more readable.
2
Answers
I’ve given this image an attempt with basic image manipulation in python with pytesseract with mixed results. There seems to be two challenges in this image: the noisy background and the slant of the numbers. Using thresholding to set pixels to either black and white was able to almost get the bottom number as "6/0", but the slant of the "1" keeps getting recognized as a "/". The top gets read as "SEF", and I haven’t figured out how to get a better result there.
From bad picture you will never get good results. I played a bit and get this one:
Output: