So I have this text that I extracted out of a <script>
tag.
function fbq_w123456as() {
fbq('track', 'AddToCart', {
contents: [
{
'id': '123456',
'quantity': '',
'item_price':69.99 }
],
content_name: 'Stackoverflow',
content_category: '',
content_ids: ['w123456as'],
content_type: 'product',
value: 420.69,
currency: 'USD'
});
}
I’m trying to extract this information using regex and later converting it into JSON using python.
I’ve tried re.search(r"'AddToCart', (.*?);"
and a few other attempts but no luck. I am very new to regex and I am struggling with it.
{
"contents":[
{
"id":"123456",
"quantity":"",
"item_price":69.99
}
],
"content_name":"Stackoverflow",
"content_category":"",
"content_ids":[
"w123456as"
],
"content_type":"product",
"value":420.69,
"currency":"USD"
}
How would I create the regex to extract the JSON data?
2
Answers
Extracting JSON data from a string using regular expressions can be a bit tricky, especially when the JSON structure is embedded within other content. However, in the given example, we can try to extract the JSON data using regex. Here’s an example of how you could do it in Python:
This code uses the
re.search
function to find the JSON data within thehtml_script
string. The regular expression patternfbq('track'
,'AddToCart'
,({[sS]*?}))
; matches the desired JSON data by looking for the specific string'track'
,'AddToCart'
, and then capturing the content inside the curly braces{}
.If a match is found, the captured JSON data is extracted using
match.group(1)
. Then,json.loads
is used to convert the JSON data into a Python dictionary (data_dict
).Keep in mind that using regular expressions to parse complex data structures like JSON is not recommended in general, as it can be error-prone. It’s usually better to use a dedicated JSON parser or HTML parser library for more reliable and maintainable code.
You can try:
Prints python dict: