Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

How to scrape and extract data from JSON file?

Yankzz
January 30, 2023
271 views
0 votes
2 Answers

I try to extract all the data for every school on the following site:

https://schulfinder.kultus-bw.de/

My code is this:

import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from requests import get
from selenium.webdriver.common.by import By
import json

url = "https://schulfinder.kultus-bw.de/api/school?uuid=81af189c-7bc0-44a3-8c9f-73e6d6e50fdb&_=1675072758525"

payload = {}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

Output is this:

{
  "outpost_number": "0",
  "name": "Gartenschule Grundschule Ebnat",
  "street": "Abt-Angehrn-Str.",
  "house_number": "5",
  "postcode": "73432",
  "city": "Aalen",
  "phone": "+49736796700",
  "fax": "+497367967016",
  "email": "[email protected]",
  "website": null,
  "tablet_tranche": null,
  "tablet_platform": null,
  "tablet_branches": null,
  "tablet_trades": null,
  "lat": 48.80094,
  "lng": 10.18761,
  "official": 0,
  "branches": [
    {
      "branch_id": 12110,
      "acronym": "GS",
      "description_long": "Grundschule"
    }
  ],
  "trades": []
}

I got the code via Chrome Inspector Network and requested the URL per Postman. My problem is, that I just get the Info for one school, and I can’t find out how to request all the schools.

Answers

Simply use the correct endpoint:

https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084

That will give you a list of schools, that could be used to request further data via your endpoint from question (https://schulfinder.kultus-bw.de/api/school?…) using the uuid.

[{"uuid":"50de01a4-503d-44d1-af4b-a6031a022b85","outpost_number":"0","name":"Grundschule Aach","city":"Aach","lat":47.84399,"lng":8.85067,"official":0,"marker_class":"marker green","marker_label":"G","website":null},{"uuid":"8818037f-9aed-4860-b42e-8a49b1403c02","outpost_number":"0","name":"Braunenbergschule Grundschule Wasseralfingen","city":"Aalen","lat":48.8612,"lng":10.11191,"official":0,"marker_class":"marker green","marker_label":"G","website":null},...]

Be aware, that the result is limited to 500 and you have to use and filters and combine results to get all of them.:

Das Suchlimit wurde erreicht. Mehr als 500 Treffer werden nicht angezeigt. Bitte verfeinern Sie Ihre Suche indem Sie z. B. einen Ort angeben.

Example

import requests

url = 'https://schulfinder.kultus-bw.de/api/schools?distance=1&outposts=1&owner=&school_kind=&term=&types=&work_schedule=&_=1675079497084'

data = []

for uuid in [item['uuid'] for item in requests.get(url).json()]:
    url = url = f'https://schulfinder.kultus-bw.de/api/school?uuid={uuid}&_=1675072758525'
    data.append(
        requests.get(url).json()
    )

data

Output

[{'outpost_number': '0', 'name': 'Grundschule Aach', 'street': 'Schulstr.', 'house_number': '5', 'postcode': '78267', 'city': 'Aach', 'phone': '+4977741442', 'fax': None, 'email': '[email protected]', 'website': None, 'tablet_tranche': None, 'tablet_platform': None, 'tablet_branches': None, 'tablet_trades': None, 'lat': 47.84399, 'lng': 8.85067, 'official': 0, 'branches': [{'branch_id': 12110, 'acronym': 'GS', 'description_long': 'Grundschule'}], 'trades': []}, {'outpost_number': '0', 'name': 'Braunenbergschule Grundschule Wasseralfingen', 'street': 'Steinstr.', 'house_number': '38', 'postcode': '73433', 'city': 'Aalen', 'phone': '+49736197700', 'fax': '+497361977019', 'email': '[email protected]', 'website': 'http://www.braunenbergschule.de', 'tablet_tranche': None, 'tablet_platform': None, 'tablet_branches': None, 'tablet_trades': None, 'lat': 48.8612, 'lng': 10.11191, 'official': 0, 'branches': [{'branch_id': 12110, 'acronym': 'GS', 'description_long': 'Grundschule'}], 'trades': []},...]

- Ivan
- January 30, 2023 at 2:15 pm
- 0 votes
0
In addition to the answer already given.

To get all the search criteria for the GET request to the API, you can parse the main page contents using BeautifulSoup you’ve already imported:
```
from bs4 import BeautifulSoup
import requests

search_page_url = "https://schulfinder.kultus-bw.de"
page_contents = requests.request("GET", search_page_url).text

parsed_html = BeautifulSoup(page_contents, features="html.parser")
input_elements = parsed_html.body.find_all('input')
search_params = list(map(lambda x: (x.get('name'), x.get('type'), x.get('value')), input_elements))
```
search_params contains tuples of a name, type, and value. It should give you insights into parameters and their possible values.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.