I have a code that parses information about competitions from the RSСF website. Yes, yes, parsing again. But don’t worry, I already know what and how. And wrote the code. It works like clockwork for me. Doesn’t give any errors.
import requests
from bs4 import BeautifulSoup
import re
import os
from urllib.request import urlopen
import json
from urllib.parse import unquote
import warnings
warnings.filterwarnings("ignore")
BASE_URL = 'https://www.rscf.ru/contests'
session = requests.Session()
session.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0'
items = []
max_page = 10
for page in range(1, max_page + 1):
url = f'{BASE_URL}/?PAGEN_2={page}/' if page > 1 else BASE_URL
print(url)
rs = session.get(url, verify=False)
rs.raise_for_status()
soup = BeautifulSoup(rs.content, 'html.parser')
for item in soup.select('.classification-table-row.contest-table-row'):
number = item.select_one('.contest-num').text
title = item.select_one('.contest-name').text
date = item.select_one('.contest-date').text.replace("n", "").replace("Подать заявку", "")
documents = item.select_one('.contest-docs').text.replace("n", " ").replace(" ", " ").replace(" ", " ")
synopsis = item.select_one('.contest-status').text.replace("n", " ")
items.append({
'Номер': number,
'Наименование конкурса': title,
'Приём заявок': date,
'Статус': synopsis,
'Документы': documents,
})
with open('out.json', 'w', encoding='utf-8') as f:
json.dump(items, f, indent=4, ensure_ascii=False)
Everything works, everything is in order. There is one nuance.
The fact is that the site has such a feature – the color of the text. Depending on whether the competition is active or completed, the status is colored in a certain color. If applications are being accepted, the status is highlighted in green. If an examination is carried out – orange. And if the contest is over – red. Here are the contests.
https://www.rscf.ru/contests/
And I need the code to output in JSON the text that is marked in red, orange or green in HTML. Unfortunately, I couldn’t find anything similar on the Internet. There are only codes that color the text in color. But do not extract ready.
I tried to write a code
redword = item.select_one('.contest-danger').text
orangeword = item.select_one('.contest-danger').text
greenword = item.select_one('.contest-success').text
for synopsis in item.select_one('.contest-status').text:
try:
syn = re.sub(orangeword, str(synopsis))
except:
syn = re.sub(orangeword, str(greenword))
items.append({
'Номер': number,
'Наименование конкурса': title,
'Приём заявок': date,
'Статус': syn,
'Документы': documents,
})
but it gave me only error
redword = item.select_one('.contest-danger').text
AttributeError: 'NoneType' object has no attribute 'text'
Can you help me please?
2
Answers
So, I decided to write the next code.
Result is:
You can try it by yourself
you can get the color here
Here is the long explanation.
Short version