I am a Python newbie, currently facing issues with my scraping code.
The script successfully accesses the website and avoids cookies.
However, it is unfortunately not copying the entire HTML code.
This is the full part of the HTML code on the website:
<div class="index__factor__Mo6xW p-base-regular">
<h4 class="index__title__Rq0Po">Arbeitsatmosphäre</h4>
<div class="index__block__7hodp index__scoreBlock__KZCPC">
<span class="index__stars__nfK6S index__medium__CyRQn index__stars__bpFJl" data- fillcolor="butterscotch" data-score="5"></span>
</div>
<p class="index__plainText__JgbHE">Dynamisch</p>
</div>
And this is the code which is extracted:
<div class="index__factor__Mo6xW p-base-regular">
<h4 class="index__title__Rq0Po">Work Atmosphere</h4>
<p class="index__plainText__JgbHE">Dynamic</p>
</div>
This is the code I already tried to extract:
url = "https://www.kununu.com/de/adidas/kommentare"
driver = webdriver.Chrome()
driver.get(url)
[...]
show_more_reviews(driver, 5) #Code clicks on "Read more Reviews"
make_mini_scores_visible(driver) #Code shows al "Mini Scores" like "Arbeitsatmosphäre"
all_reviews = driver.execute_script("return document.documentElement.innerHTML;")
soup = BeautifulSoup(html, 'html.parser')
It is important that the whole code is extracted since I need every piece of information.
Thank you in advance!
2
Answers
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
Rest of you code remain same
you can try this this might help you
The data you see on the page is stored inside
<script>
element in Json form, so you can use that:Prints: