I am trying to use BeautifulSoup to access thesaurus.com in order to quickly find synonyms for certain words. However, the synonyms are in a list that has different ids and classes per word, and so the best thing I can do is access a grandparent that is the same per word: Here is a simplified example:
<div data-testid="same_between_words">
<ul class="different_between_words">
<li>
<a data-linkid="same_between_words_2">Word 1</a>
</li>
<li>
<a data-linkid="same_between_words_2">Word 2</a>
</li>
</ul>
</div>
There’s also similar words which are fine to include if necessary and antonyms which are obviously not fine to include. In case it matters, the words do have the same data-linkid between each other and different words but they’re also the same as antonyms, so I haven’t gotten that to work. My current code is
from bs4 import BeautifulSoup
import requests
url = "https://www.thesaurus.com/browse/EXAMPLE WORD"
page = requests.get(url)
html = page.text
soup = BeautifulSoup(html,"html.parser")
ele = soup.find('div', attrs={'data-testid': 'word-grid-container'})
syn = ele.findChildren('ul', recursive=False)
print(syn)
which gives all of the html for the data-testid in a big old mess, and adding .text doesn’t seem to work since it’s saying I’m treating a list of results like a single one (which I don’t think I am. I’m not using find_all). Not to mention I think adding that would just give me the first synonym which isn’t ideal.
I’d like to get a list of synonyms from a word. I’ve gotten a big single string with all the words but I would love to have it be in a list I can work with since some synonyms have spaces in them (like ‘fine and dandy’ for ‘good’. I can’t split a string based on spaces then).
2
Answers
Each word is in
a
tag withfont-weight="inherit"
property, you can even just select alla
tags.You are near to your goal, but to give you an idea, try to select by static things id or HTML structure, may use
css selectors
for convenience.Example
Output