So I’m trying to extract a table from this website https://careersportal.ie/courses/simple_search.php . I tried using
courses = pd.read_html('https://careersportal.ie/courses/simple_search.php')
but I’m getting some sort of Import Error saying I need to install html5lib, which I’ve install with pip but still get the same issue.
I tried using the built-in pandas methods pd.read_csv()
and pd.read_html
2
Answers
You’re trying to extract a table from a website using Python and pandas,
Before you can extract data from a website, you need to understand its structure and how the table you want to extract is represented in the HTML.
use BeautifulSoup to parse the HTML content and locate the table you want.
To get the table into a pandas DataFrame you can use next example:
Prints: