Azure – PySpark Tabula-Py Read_PDF (ERROR: No module named 'org.apache.commons')
I've been runnning a pipeline in Azure for 4 months and it suddenly broke last night. I have the following code: !pip install tabula-py from tabula.io import read_pdf import tabula df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=3, stream=True)[0] I got this…