skip to Main Content
<html>
 <body>
  <p>
   {"products":[{"id":2069443051648,"title":"BornxRaised Indian Summer Print Button-Up Shirt - Multi","handle":"bxrb4010smpt-mlt"}]
  </p>
 </body>
</html>

Hi, I am very new to bs4 and web scraping. I am trying to make an app that returns product information like the item’s id, title, and handle. The above html code is from a shopify website and is what I am working with at the moment. Below I have posted what I’ve got so far in terms of extracting content

import requests
from bs4 import BeautifulSoup

source = requests.get('https://kith.com/products.json').text
soup = BeautifulSoup(source, 'lxml')
p = soup.find('p')
text = list(p.children)[0]


print('')
print(text.strip())

Which returns:


{"products":[{"id":2069443051648,"title":"BornxRaised Indian Summer Print Button-Up Shirt - Multi","handle":"bxrb4010smpt-mlt"}]

I am having issues simplify this furthermore to only show id, title and handle. Does anyone have any suggestions? I might even have went about this all wrong, I’m kind of tapped out here..

3

Answers


  1. You don’t need to use BeautifulSoup in this instance since you’re working with a json feed. requests provides a convenient way to access the response as a json object; therefore, all you need to do is:

    import requests
    
    plist = requests.get('https://kith.com/products.json').json()
    for product in plist:
        print(product['id'], product['title'], product['handle'])
    
    Login or Signup to reply.
  2. Since the URL is a json, you can directly use the json library instead of using Beautifulsoup. What you need to do is:

    import requests
    import json
    
    source = requests.get('https://kith.com/products.json').text
    js = json.loads(source)
    
    id = js["products"][0]["id"]
    title = js["products"][0]["title"]
    handle = js["products"][0]["handle"]
    
    print(id, title, handle)
    

    And the result would be like this:

    2069443051648 BornxRaised Indian Summer Print Button-Up Shirt - Multi bxrb4010smpt-mlt
    
    Login or Signup to reply.
  3. For getting id, title and handle, you can try it:

    from bs4 import BeautifulSoup
    import requests
    import json
    
    source = requests.get('https://kith.com/products.json').text
    
    soup = BeautifulSoup(source, 'lxml')
    p = soup.find('p')
    product = json.loads(p.text)
    
    for i in range(len(product["products"])):
        id = product["products"][i]["id"]
        title = product["products"][i]["title"]
        handle = product["products"][i]["handle"] 
        
        print("id : " + str(id) + " title : " + str(title) + " handle: "+ str(handle))
    
    

    Output will be:

    id : 2069443051648 title : BornxRaised Indian Summer Print Button-Up Shirt - Multi handle: bxrb4010smpt-mlt
    id : 2069447803008 title : Kith Patchwork Seersucker Bucket Hat - Multi handle: kh5850-115
    id : 2069446688896 title : Kith Patchwork Howard Tee - Red / Navy / Multi handle: kh3789-115
    id : 2069445705856 title : Kith Patchwork Camp Shirt - Red / Navy / Multi handle: kh3790-115
    id : 2069445738624 title : Kith Patchwork Bandana Hardaway - Red / Navy / Multi handle: kh6327-115
    id : 2069447508096 title : Kith Patchwork Bandana Gi - Red / Navy / Multi handle: kh1242-115
    id : 2069445509248 title : Nike Toddler Air Jordan 5 Retro Top 3 - Black / Emerald / Fire / Grape handle: jbcz2991-001
    id : 2069442494592 title : Nike Pre-School Air Jordan 5 Retro Top 3 - Black / Emerald / Fire / Grape handle: jbcz2990-001
    id : 2069442920576 title : Nike Grade School Air Jordan 5 Retro Top 3 - Black / Emerald / Fire / Grape handle: jbcz2989-001
    id : 2069442723968 title : Nike Air Jordan 5 Retro Top 3 - Black / Emerald / Fire / Grape handle: jbcz1786-001
    id : 2069448622208 title : A Cold Wall Block Logo Socks - Charcoal handle: acwmsk001whl-chr
    id : 2069441478784 title : 1017 ALYX 9SM Soft Classic Hat Curved Zip - Beige handle: aauha0026fa01beg0004
    id : 2069394227328 title : Vans x Kenzo UA OG Sk8-Hi LX - Multi handle: vans-x-kenzo-ua-og-sk8-hi-lx-multi
    id : 2069394260096 title : Vans x Kenzo UA OG Old Skool LX - Multi handle: vn0a4p3x01h
    id : 2069441642624 title : Nike Toddler Air Jordan 6 Retro - Neutral Grey / Black / White / True Red handle: jb384667-062
    id : 2069441380480 title : Nike Pre-School Air Jordan 6 Retro - Neutral Grey / Black / White / True Red handle: jb384666-062
    id : 2069441314944 title : Nike Grade School Air Jordan 6 Retro - Neutral Grey / Black / White / True Red handle: jb384665-062
    id : 2069441249408 title : Nike Air Jordan 6 Retro - Neutral Grey / Black / White / True Red handle: jbct8529-062
    id : 2069438070912 title : Nike WMNS Daybreak - Pale Ivory / Pollen Rise / Shimmer / Track Red handle: nkck2351-102
    id : 2069445443712 title : Kith Fix The System Tee - Black handle: kh3fix-100
    id : 2069445476480 title : Kith Fix The System Tee - White handle: kith-fix-the-system-tee-white
    id : 2069445312640 title : adidas Consortium Hyke AOH 001 - White handle: aafv3915
    id : 2069445345408 title : adidas Consortium Hyke AOH 001 - Python / Cloud White handle: aafv4254
    id : 2069422964864 title : Proenza Schouler Jersey Bodega Print Tee - Pistachio handle: pswl2024146-grn
    id : 2069344223360 title : No Ka`oi Kindly 7/8 Leggings - Sky handle: nkp3cssnokw72584a0bl
    id : 2069344125056 title : No Ka`oi Sweetie On Zip-Up Jacket - Multicolor 50 handle: nkp3cfenokw72540a0
    id : 2069359263872 title : Area Crystal Sweetheart Mini Dress - Rainbow handle: arre20d20068-mlt
    id : 2069387608192 title : Nike Air Tailwind 79 SE - Midnight Navy / Black handle: nkck4712-400
    id : 2069404352640 title : Nike Pre-School Sunray Protect 2 - Oracle Aqua / Ghost Green / Blue handle: nk943826-303
    id : 2069352382592 title : Nike Air Zoom Spiridon Cage 2 - Light Smoke Grey / Metallic Silver handle: nkcj1288-001
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search