I am trying to extract data from a complex data structure (example will follow). I am not sure what is the best method (e.g. Beautifullsoup) to parse the test and get the desired field in a table. I need for example the number, date and e-mail of each block:
var gIg35809469970000987890data = {
"values": [
[
"33765",
"33765",
"06-03-2023",
"[email protected]",
"indoor 1",
"",
"1",
"10",
"16",
"",
"DELETE",
"33765",
{
"salt": "abSaocf8wyyJVMYVCEyAlg",
"protected": "/54hAxJ90PKjrfjGC3Y_a6vaKuq6wF2a3LCPRBN-RlVRZxzepbuNLBRmI2MPaiYoOPPI0miY-MTodCl2rrwBwAg",
"rowVersion": "vCwg8r2ZJr9wjvHwoZVrvxkfaCrWiTnUcosn89iFOO2yV-UvFxc1oo9AWsJomlw1IpKd-IZTUHLJjefknOMc5g",
"fields": {
"DEL": {
"url": "javascript:void(null);"
}
}
}
],
[
"33623",
"33623",
"03-03-2023",
"[email protected]",
"indoor 1",
"",
"1",
"10",
"16",
"",
"DELETE",
"33623",
{
"salt": "KHpHaz4-fwN4l3fLmPX6AQ",
"protected": "/B-ZmlAlvRzPee4kU-QvteJQUy0aP89g08BkpdE5CE-i8_JcsN2sKLELqYh2ZZ9vWZTbp4DtWFYjfO5NDAoKsmA",
"rowVersion": "3JRQAE4fTETSgERkg3kRCuW2nZiUL_jOcSvLGXNkV6-lpfLOLPhduXAlmgcqEI6gSWX-yI-Fd5uMBbU5iqFXZA",
"fields": {
"DEL": {
"url": "javascript:void(null);"
}
}
}
]
]
}
(Shortened the file)
I was thinking BeautifullSoup + Regex, but not sure.
Also I am not sure what is the data type: dictionary + array?
Thanks in advance.
3
Answers
I got it working, in an ugly way, but will try Pandas as well. Makes more sense as well, as I want to do some data manipulation.
I will include my code below. Basically it takes the source html page, cuts the pieces I don't need and convert to JSON object.
Suggestions are always welcome. Thanks
Based on the provided information there are many options. I’ll demonstrate python native package
json
:Explanation of json notation
Braces surround dictionaries; square brackets surround lists (just like in python.
The data looks like a json text return from an api
given:
You can probably load it directly into a dataframe using:
You might also work with it to get the filtered columns more manumatically via:
and then to a dataframe if you wished