skip to Main Content

I am building a simple scraper that will go to a url, and pull the info from that page. I know, right? My issue is that when I pull the page info, it is not a traditional html with headers and formatting. It is simple text. Is there a way to only grab certain bits of information? I was going to try to export the page info, then read thru it and make another text file with only the bits I need!

The reason I need this is that I am trying to pull 13,000+ item ID’s and organize them in a big ID dump! I was trying to convert it to JSON text format that the website usually uses. It is Moviestarplanet2, I have been tasked to look into this game.

This is my code so far(I know its basic!):

# Web scraper test
from bs4 import BeautifulSoup
import html_to_json
import json
import requests
import time

IDNum = 688
url = 'https://us.mspapis.com/shopinventory/v1/shops/listings/' + str(IDNum)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)

I am needing to iron this problem out as early as possible as it is the primary function of this scraper. Any ideas or advice would be helpful! I am not advanced in python, but I can usually follow along pretty well. Sorry in advance if I ask any redundant or stupid questions.

I have tried using HTML to Json library, the built in json, googled for a good 2 hours and just slapping stuff in to see if it works. I’d like to actually learn instead of copy and pasting it from someone else and see why it does what it does.

EDIT! This is the data that I am trying to format!

{'id': '688', 'item': {'id': '912', 'type': 'item', 'singlePurchase': True, 'objectSource': 'curatedcontentitemtemplates', 'objectId': '596', 'resourceIdentifiers': [{'type': 'name', 'key': 'Neutral'}, {'type': 'graphics', 'key': 'default'}], 'tags': [{'hidden': False, 'id': '62', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_MOODS'}, {'type': 'graphics', 'key': 'moods'}], 'type': 'category.animation', 'gameId': '5lxc'}, {'hidden': False, 'id': '85', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_MOODS_BASIC'}, {'type': 'graphics', 'key': 'moods'}], 'type': 'subcategory.animation.62', 'gameId': '5lxc'}, {'hidden': False, 'id': '168', 'resourceIdentifiers': [{'type': 'label', 'key': 'TAG_FREE'}, {'type': 'graphics', 'key': 'free'}], 'type': 'category.artbooks', 'gameId': '5lxc', 'lookUpId': 'tag_free'}], 'lookUpId': 'f4b919d8-15f9-4dae-964f-bd9262db0a5b', 'additionalData': {'NebulaData': {'DefaultColors': '#FFFFFF', 'Snapshot': 'default_preview'}, 'MSP2Data': {'Loop': 'false'}}}, 'shopId': '8', 'price': {'currency': 'soft', 'salesPrice': 0.0, 'onSale': False}, 'lookUpId': '827e8ca7-60de-4d07-b0ae-61154d579b77'}

2

Answers


  1. That endpoint just returns JSON, so just call resp.json().

    import requests
    import pprint
    
    IDNum = 688
    url = f'https://us.mspapis.com/shopinventory/v1/shops/listings/{IDNum}'
    resp = requests.get(url)
    resp.raise_for_status()
    data = resp.json()
    pprint(data["item"])  # or whatever
    
    Login or Signup to reply.
  2. If I understand you correctly you want to print the Json on multiple lines:

    import json
    import requests
    
    IDNum = 688
    url = "https://us.mspapis.com/shopinventory/v1/shops/listings/{}"
    
    page = requests.get(url.format(IDNum))
    data = page.json()
    
    # print the Json on multiple lines:
    print(json.dumps(data, indent=4))
    

    Prints:

    {
        "id": "688",
        "item": {
            "id": "912",
            "type": "item",
            "singlePurchase": true,
            "objectSource": "curatedcontentitemtemplates",
            "objectId": "596",
            "resourceIdentifiers": [
                {
                    "type": "name",
                    "key": "Neutral"
                },
                {
                    "type": "graphics",
                    "key": "default"
                }
            ],
            "tags": [
                {
                    "hidden": false,
                    "id": "62",
                    "resourceIdentifiers": [
                        {
                            "type": "label",
                            "key": "TAG_MOODS"
                        },
                        {
                            "type": "graphics",
                            "key": "moods"
                        }
                    ],
                    "type": "category.animation",
                    "gameId": "5lxc"
                },
                {
                    "hidden": false,
                    "id": "85",
                    "resourceIdentifiers": [
                        {
                            "type": "label",
                            "key": "TAG_MOODS_BASIC"
                        },
                        {
                            "type": "graphics",
                            "key": "moods"
                        }
                    ],
                    "type": "subcategory.animation.62",
                    "gameId": "5lxc"
                },
                {
                    "hidden": false,
                    "id": "168",
                    "resourceIdentifiers": [
                        {
                            "type": "label",
                            "key": "TAG_FREE"
                        },
                        {
                            "type": "graphics",
                            "key": "free"
                        }
                    ],
                    "type": "category.artbooks",
                    "gameId": "5lxc",
                    "lookUpId": "tag_free"
                }
            ],
            "lookUpId": "f4b919d8-15f9-4dae-964f-bd9262db0a5b",
            "additionalData": {
                "NebulaData": {
                    "DefaultColors": "#FFFFFF",
                    "Snapshot": "default_preview"
                },
                "MSP2Data": {
                    "Loop": "false"
                }
            }
        },
        "shopId": "8",
        "price": {
            "currency": "soft",
            "salesPrice": 0.0,
            "onSale": false
        },
        "lookUpId": "827e8ca7-60de-4d07-b0ae-61154d579b77"
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search