I have this html:
<html lang="en" class="no-js">
<div>
<p class="price ">
3.75
</p>
<p>21</p>
</div>
</html>
I want to get the class of this
The problem is what ever I do to try to get it, every time he comes without the space.
current_element.get(‘class’)…
Even str(current_element) come like this:
'<p class="price">3.75</p>'
How can I get the text of the class in raw? Or something like that?
Regex of all the html is not a option cuz I can have htmls with 11k of lines and more
Thanks!
2
Answers
Class names in HTML cannot have spaces in them. Spaces are used within documents to separate classes when more than one is assigned to an element. In this case, the trailing space without any further class is treated by a single class assignment.
Any HTML parser must interpret it that way, browsers and libraries alike, as the space isn’t part of the name it won’t be returned by libraries or by the DOM JS functions. This is expected behavior.
If you really want to get that space, you need to use other means of parsing the HTML, some library that does not understand HTML so that it doesn’t interprets it.
If you use the keyword argument
multi_valued_attributes=None
in your beautifulsoup constructor you will get the class string with the space.(Source: https://beautiful-soup-4.readthedocs.io/en/latest/#multi-valued-attributes )
You will however lose the functionality of accessing multi-value attributes (such as
class
) as listsResult: