I’m trying to scrape specific details as a list
from a page using BeautifulSoup
in python.
<p class="collapse text in" id="list_2">
<big>•</big>
car
<br>
<big>•</big>
bike
<br>
<span id="list_hidden_2" class="inline_hidden collapse in" aria-expanded="true">
<big>•</big>
bus
<br>
<big>•</big>
train
<br><br>
</span>
<span>...</span>
<a data-id="list" href="#list_hidden_2" class="link_sm link_toggle" data-toggle="collapse"
aria-expanded="true"></a>
</p>
I need a list
with every text contained in the <p>
like this,
list = ['car', 'bike', 'bus', 'train']
from bs4 import BeautifulSoup
import requests
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
p_tag = soup.find("p", {"id":"list_2"})
list = p_tag.text.strip()
print(list)
output:
• car• bike
• bus• train
How to convert this as a list
like, list = ['car', 'bike', 'bus', 'train']
2
Answers
Note: Avoid using
python
reserved terms (keywords
), this could have unwanted effects on the results of your code.There are several ways to get your goal. I would recommend to work on your strategy selecting the elements. Select all
<big>
first and than pick itsnext_sibling
:Example
Output
Even I thought the similar way as @HedgeHog did. The alternative way is also to select the
<br>
tag and get the text prior to it usingprevious_sibling
.You can modify your scraping code as:
Output: