skip to Main Content

I’m detecting text from images in python. I’m using pytesseract 0.3.10 . but it seems this library is just working on windows OS while I’m using Ubuntu 22.04. in all examples codes using cmd address like this.

pt.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'

how can I shange this to run on ubuntu. by the way I’m using jupyter-lab.

I changed this code to other fromats like this:

original example code:

pt.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'

I changed code to :

pt.pytesseract.tesseract_user_cmd = 'user/share/...'

or

pt.pytesseract.tesseract_user_terminal = 'user/share/...'

2

Answers


  1. Add Tesseract OCR 5 PPA to your system.
    To add the Tesseract OCR 5 PPA to your system, run the command below.

    sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
    

    Install Tesseract on Ubuntu
    Run the command :

    sudo apt install -y tesseract-ocr
    

    Once installation is complete update your system

    sudo apt update
    
    Login or Signup to reply.
  2. First try to update and upgrade repos:

    sudo apt update
    
    sudo apt upgrade
    

    To install pytesseract, run this command

    pip install pytesseract
    

    To support languages other than English, use this command along with upper one:

    sudo apt install tesseract-ocr-tam
    

    Language examples:

    • eng -> English
    • guj -> Gujarati
    • tam -> tamil

    You can find more: just use

    sudo apt install tesseract-ocr
    

    and press tab to get all possibilities

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search