skip to Main Content

I’ve been runnning a pipeline in Azure for 4 months and it suddenly broke last night. I have the following code:

!pip install tabula-py
from tabula.io import read_pdf
import tabula
df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=3, stream=True)[0]

I got this error all of a sudden now:

~/cluster-env/env/lib/python3.8/site-packages/tabula/io.py in __init__(self, java_options, silent)
     92 
     93         from java import lang
---> 94         from org.apache.commons import cli
     95         from technology import tabula
     96 

ModuleNotFoundError: No module named 'org.apache.commons'

Any help would be appreciated.

2

Answers


  1. the same happened to me today in a databricks environment after tabula was running smoothly for 6 months. My hotfix was to pip install the version 2.7.0 as I suppose the error is evoked by the most current version 2.8.1 which was published today.

    Login or Signup to reply.
  2. Installing version 2.7.0 with the command pip install tabula-py==2.7.0 worked for me as well.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search