I wrote this code in python to get all the links and to put them in a json file, but for some reason i am only getting the last link (website and class see in the code). Any ideas, why is it not working properly?
import requests
from bs4 import BeautifulSoup
import json
headers = {
> "Accept": "*/*",
> "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0Safari/537.36"
> }
number = 0
for page_number in range(1, 2):
url = f"https://www.sothebysrealty.com/eng/associates/int/{page_number}-pg"
req = requests.get(url, headers=headers)
src = req.text
soup = BeautifulSoup(src, "lxml")
name_link = soup.find_all("a", class_="Entities-card__cta btn u-text-uppercase u-color-sir-blue palm--hide")
all_links_dict = {}
for item in name_link:
value_links = ("https://www.sothebysrealty.com" + item.get("href"))
all_links_dict[number + 1] = value_links
with open("all_links_dict.json", "w", encoding="utf-8-sig") as file:
json.dump(all_links_dict, file, indent=4, ensure_ascii=False)
3
Answers
This is because
all_links_dict[number + 1] = value_links
is not in yourfor item in name_link
loop. Hence you only add to the dict once.You must also increment number in the loop.
There’s a few things I notice here.
Firstly, your page numbers
range(1,2)
. In python the stop is not included in the range so the for loop will only run once with a page number of 1.Secondly, your
all_links_dict = {}
line is resetting the dictionary to an empty dict each time.Lastly, you are opening the file each iteration of the loop in
'w'
mode and then json dumping which will overwrite any previous contents.I would advise to adjust your range, move the dictionary initialisation out of the for loop and dump the dictionary to your file once at the end outside of the for loop.
There are several issues:
You’re not updating
number
at any point, so only a value keyed to1
gets saved every loop. Either useall_links_dict[page_number] = value_links
aspage_number
updates itself in each iteration, or add a line to incrementnumber
.You should use
mode="a"
instead of"w"
to append instead of over-writing in each iteration. However, you should be aware that the file will not be valid json (ie, you can’t decode it any more) after a second iteration. Might be better to have a list that you append to every time and then write the list to json after (or at the end of) the loop.There’s also the fact that
for page_number in range(1, 2):
will only lead to one iteration (wherepage_number
is 1), so even with all this, only one page’s info will be saved unless the range is expanded to include more pages.