skip to Main Content

I want to extract the legal form of the search term, in this example "Fondazione Amici di AMCA" on https://www.zefix.ch/en/search/entity/welcome

The code always yields "No search results found.", despite the entry exists. Where is the mistake?

The expected output is ‘Foundation’ from https://www.zefix.ch/en/search/entity/list/firm/1151591 (this url emerges when entering the search parameter "Fondazione Amici di AMCA" on https://www.zefix.ch/en/search/entity/welcome

import requests  
from bs4 import BeautifulSoup

search_url = "https://www.zefix.ch/de/search/entity/welcome" search_params = {"name": "Fondazione Amici di AMCA"}

response = requests.get(search_url, params=search_params) soup = BeautifulSoup(response.content, "html.parser")

# Find the link to the details page for the first search result search_results = soup.find_all("div", {"class": "result-entry"}) if search_results:
    first_result = search_results[0]
    details_link = first_result.find("a", {"class": "list-group-item-1"})
    if details_link:
        details_url = f"https://www.zefix.ch{details_link['href']}"

        # Visit the details page and extract the legal form
        details_response = requests.get(details_url)
        details_soup = BeautifulSoup(details_response.content, "html.parser")
        legal_form = details_soup.find("td", text="Rechtsform:").find_next_sibling("td").get_text(strip=True)

        print(f"The legal form of Fondazione Amici di AMCA is: {legal_form}")
    else:
        print("No link to details page found.") else:
    error_message = soup.find("div", {"class": "no-result-message"})
    if error_message:
        print(error_message.text.strip())
    else:
        print("No search results found.")

2

Answers


  1. You can use their Ajax API to get the legal form of the entity from search query:

    import requests
    
    legal_forms_url = "https://www.zefix.ch/ZefixREST/api/v1/legalForm.json"
    search_api = "https://www.zefix.ch/ZefixREST/api/v1/firm/search.json"
    
    legal_forms = {f["id"]: f for f in requests.get(legal_forms_url).json()}
    
    payload = {
        "languageKey": "en",
        "maxEntries": 30,
        "name": "Fondazione Amici di AMCA",
        "offset": 0,
        "searchType": "exact",
    }
    
    data = requests.post(search_api, json=payload).json()
    print(legal_forms[data["list"][0]["legalFormId"]]["name"]["en"])
    

    Prints:

    Foundation
    
    Login or Signup to reply.
  2. The data you’re looking for is loaded dynamically using javascript so you can’t get it directly using requests. You need to find the API call (it can be found in the Developer console in your browser under Network/XHR. You may have to do some more research if you’re not familiar with those tools).

    Once you do that, you get back a JSON which you can easily parse. Note, I did it in English so you may want to change that:

    import json
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'en-US,en;q=0.5',
        'Content-Type': 'application/json',
        'Origin': 'https://www.zefix.ch',
        'Connection': 'keep-alive',
        'Referer': 'https://www.zefix.ch/en/search/entity/welcome',
    }
    
    json_data = {
        'languageKey': 'en',
        'maxEntries': 30,
        'offset': 0,
        'name': 'Fondazione Amici di AMCA',
        'searchType': 'exact',
    }
    
    response = requests.post('https://www.zefix.ch/ZefixREST/api/v1/firm/search.json', headers=headers, json=json_data)
    zf = response.json()
    

    The data is hiding in zf['list'] which is a one-element list of dictionaries. For example:

    zf['list'][0]['legalSeat']
    

    will return:

    'Bellinzona'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search