tabula-py Questions

Azure – PySpark Tabula-Py Read_PDF (ERROR: No module named 'org.apache.commons')

September 18, 2023
mohamadmaarouf_
2 Answers

I've been runnning a pipeline in Azure for 4 months and it suddenly broke last night. I have the following code: !pip install tabula-py from tabula.io import read_pdf import tabula df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=3, stream=True)[0] I got this…

VIEW QUESTION

How to use tabula in AWS Lambda to read PDF table – Ubuntu

September 21, 2022
ttam10
2 Answers

Hello I get the following error while trying to use tabula to read a table in a pdf. I was aware of some of the difficulties (here) using this package with AWS lambda and tried to zip the tabula package…

VIEW QUESTION