skip to Main Content

I need to extract different elements from a JSON part of the page, each element under @type.

I tried this post but could not figure it out.

data = soup.findAll('script', {'type':'application/ld+json'})
oJson = json.loads(data.text)["model"] gives an error of 
AttributeError: ResultSet object has no attribute 'text'

Would appreciate help.

<script type="application/ld+json"
{"@context":"https://schema.org/",
"@type":"Product","brand":"Salomon","category":"Basecaps","description":"<p>Leichte Sportkappe f&uuml;r Sport bei Sonne und Regen</p>","image":"https://static.bergzeit.com/product_gallery_regular/1118101-006_pic1.jpg",

"model":[
{"@type":"ProductModel","color":"fiery red","image":"https://static.bergzeit.com/product_gallery_regular/1118101-003_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"fiery red Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-003","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-003"},

{"@type":"ProductModel","color":"nightshade","image":"https://static.bergzeit.com/product_gallery_regular/1118101-005_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"nightshade Cross Cap","price":24.77,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-005","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-005"},

{"@type":"ProductModel","color":"deep black","image":"https://static.bergzeit.com/product_gallery_regular/1118101-001_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"deep black Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-001","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-001"},

{"@type":"ProductModel","color":"chambray blue","image":"https://static.bergzeit.com/product_gallery_regular/1118101-002_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"chambray blue Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-002","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-002"},

{"@type":"ProductModel","color":"bering sea","image":"https://static.bergzeit.com/product_gallery_regular/1118101-008_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"bering sea Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-008","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-008"},

{"@type":"ProductModel","color":"deep lichen green","image":"https://static.bergzeit.com/product_gallery_regular/1118101-007_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","name":"deep lichen green Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-007","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-007"},

{"@type":"ProductModel","color":"white","image":"https://static.bergzeit.com/product_gallery_regular/1118101-004_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"white Cross Cap","price":24.95,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-004","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-004"},

{"@type":"ProductModel","color":"peach amber","image":"https://static.bergzeit.com/product_gallery_regular/1118101-006_pic1.jpg","name":"Cross Cap","offers":[{"@type":"Offer","availability":"http://schema.org/InStock","name":"peach amber Cross Cap","price":24.63,"priceCurrency":"EUR"}],"size":"ONE SIZE","sku":"1118101-006","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/#itemId=1118101-006"}],"name":"Cross Cap","offers":[{"@type":"AggregateOffer","availability":"http://schema.org/InStock","highPrice":24.95,"lowPrice":24.63,"priceCurrency":"EUR"}],"productId":"1118101","url":"https://www.bergzeit.de/p/salomon-cross-cap/1118101/"}
</script>

2

Answers


  1. Assume the text in the question is assigned to a variable named data then…

    from bs4 import BeautifulSoup as BS
    import json
    
    soup = BS(data, 'lxml')
    
    for script in soup.find_all('script'):
        j = ''.join(script.getText().splitlines()[1:])
        for model in json.loads(j)['model']:
            print(model['sku']) # for example
    

    Output:

    1118101-003
    1118101-005
    1118101-001
    1118101-002
    1118101-008
    1118101-007
    1118101-004
    1118101-006
    
    Login or Signup to reply.
  2. You are using deprecated syntax: findAll should be find_all.

    Next, find_all returns a list, which has no such attribute as text.

    Here is a working example of extracting that data:

    from bs4 import BeautifulSoup as bs
    import requests
    import json
    
    headers= {
        'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
    }
    
    r = requests.get("https://www.bergzeit.co.uk/p/black-diamond-womens-focus-climbing-shoe/3005027/#itemId=3005027-001", headers=headers)
    
    soup = bs(r.text, 'html.parser')
    script_w_data = soup.select_one('div[class^="product-detailed-page"] script[type="application/ld+json"]').string
    json_obj = json.loads(script_w_data)
    print(json_obj['brand'],'|',  json_obj['description'])
    

    Result in terminal:

    Black Diamond | Performance statement from the first Black Diamond climbing shoe series - world premiere
    

    See Requests documentation here, and also BeautifulSoup documentation.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search