<html>
<body>
<p>
{"products":[{"id":2069443051648,"title":"BornxRaised Indian Summer Print Button-Up Shirt - Multi","handle":"bxrb4010smpt-mlt"}]
</p>
</body>
</html>
Hi, I am very new to bs4 and web scraping. I am trying to make an app that returns product information like the item’s id, title, and handle. The above html code is from a shopify website and is what I am working with at the moment. Below I have posted what I’ve got so far in terms of extracting content
import requests
from bs4 import BeautifulSoup
source = requests.get('https://kith.com/products.json').text
soup = BeautifulSoup(source, 'lxml')
p = soup.find('p')
text = list(p.children)[0]
print('')
print(text.strip())
Which returns:
{"products":[{"id":2069443051648,"title":"BornxRaised Indian Summer Print Button-Up Shirt - Multi","handle":"bxrb4010smpt-mlt"}]
I am having issues simplify this furthermore to only show id, title and handle. Does anyone have any suggestions? I might even have went about this all wrong, I’m kind of tapped out here..
3
Answers
You don’t need to use
BeautifulSoup
in this instance since you’re working with a json feed.requests
provides a convenient way to access the response as a json object; therefore, all you need to do is:Since the URL is a json, you can directly use the
json
library instead of usingBeautifulsoup
. What you need to do is:And the result would be like this:
For getting
id
,title
andhandle
, you can try it:Output will be: