skip to Main Content

I wrote this code in python to get all the links and to put them in a json file, but for some reason i am only getting the last link (website and class see in the code). Any ideas, why is it not working properly?

import requests
from bs4 import BeautifulSoup
import json

headers = {
>     "Accept": "*/*",
>     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0Safari/537.36"
> }

number = 0

for page_number in range(1, 2):
    url = f"https://www.sothebysrealty.com/eng/associates/int/{page_number}-pg"
    req = requests.get(url, headers=headers)
    src = req.text
    soup = BeautifulSoup(src, "lxml")
    name_link = soup.find_all("a", class_="Entities-card__cta btn u-text-uppercase u-color-sir-blue palm--hide")

    all_links_dict = {}
    for item in name_link:
        value_links = ("https://www.sothebysrealty.com" + item.get("href"))

    all_links_dict[number + 1] = value_links

    with open("all_links_dict.json", "w", encoding="utf-8-sig") as file:
    json.dump(all_links_dict, file, indent=4, ensure_ascii=False)

3

Answers


  1. This is because all_links_dict[number + 1] = value_links is not in your for item in name_link loop. Hence you only add to the dict once.

    You must also increment number in the loop.

    for item in name_link:
        value_links = ("https://www.sothebysrealty.com" + item.get("href"))
        all_links_dict[number] = value_links
        number += 1
    
    Login or Signup to reply.
  2. There’s a few things I notice here.

    Firstly, your page numbers range(1,2). In python the stop is not included in the range so the for loop will only run once with a page number of 1.

    Secondly, your all_links_dict = {} line is resetting the dictionary to an empty dict each time.

    Lastly, you are opening the file each iteration of the loop in 'w' mode and then json dumping which will overwrite any previous contents.

    I would advise to adjust your range, move the dictionary initialisation out of the for loop and dump the dictionary to your file once at the end outside of the for loop.

    Login or Signup to reply.
  3. There are several issues:


        all_links_dict = {}
        for item in name_link:
            value_links = ("https://www.sothebysrealty.com" + item.get("href"))
        all_links_dict[number + 1] = value_links
    

    You’re not updating number at any point, so only a value keyed to 1 gets saved every loop. Either use all_links_dict[page_number] = value_links as page_number updates itself in each iteration, or add a line to increment number.

        all_links_dict = {}
        for item in name_link:
            value_links = ("https://www.sothebysrealty.com" + item.get("href"))
            number += 1
            all_links_dict[number] = value_links
    

        with open("all_links_dict.json", "w", encoding="utf-8-sig") as file:
            json.dump(all_links_dict, file, indent=4, ensure_ascii=False)
    

    You should use mode="a" instead of "w" to append instead of over-writing in each iteration. However, you should be aware that the file will not be valid json (ie, you can’t decode it any more) after a second iteration. Might be better to have a list that you append to every time and then write the list to json after (or at the end of) the loop.


    There’s also the fact that for page_number in range(1, 2): will only lead to one iteration (where page_number is 1), so even with all this, only one page’s info will be saved unless the range is expanded to include more pages.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search