So I have been getting some mixed answers here. Either to run with regex or not.
what I am trying to do is that I am trying to grab a specific value (The json of spConfig) in the html which is:
<script type="text/x-magento-init">
{
"#product_addtocart_form": {
"configurable": {
"spConfig": {"attributes":{"93":{"id":"93","code":"color","label":"Color","options":[{"id":"8243","label":"Helloworld","products":["97460","97459"]}],"position":"0"},"148":{"id":"148","code":"codish","label":"Codish","options":[{"id":"4707","label":"12.5","products":[]},{"id":"2724","label":"13","products":[]},{"id":"4708","label":"13.5","products":[]}],"position":"1"}},"template":"EUR <%- data.price %>","optionPrices":{"97459":{"oldPrice":{"amount":121},"basePrice":{"amount":121},"finalPrice":{"amount":121},"tierPrices":[]}},"prices":{"oldPrice":{"amount":"121"},"basePrice":{"amount":"121"},"finalPrice":{"amount":"121"}},"productId":"97468","chooseText":"Choose an Option...","images":[],"index":[]},
"gallerySwitchStrategy": "replace"
}
}
}
</script>
and here is the problem. When scraping the HTML, there is multiply <script type="text/x-magento-init">
but only one spConfig
and I have two question here.
-
Should I grab the value spConfig using Regex to later use json.loads(spConfigValue) or not? If not then what method should I use to scrape the json value?
-
If I am supposed to regex. I have been trying to do grab it using
"spConfig": (.*?)
however it is not scraping the json value for me. what am I doing wrong?
3
Answers
No, don’t ever use regex for HTML. Use HTML-parsers like
BeautifulSoup
instead!So basically for json use json parser right. ? 🤔 And for yaml use yamel parser 🤔 so in HTML do use HTML parser
See some example and also like that will make you life to shine
from html.parser import HTMLParser
https://docs.python.org/3/library/html.parser.html
In this case, with bs4 4.7.1 + :contains is your friend. You say there is only a single match for that so you can do the following:
Config is then:
with keys: