I have a JSON file of the form:
{
42 : {"name": "MeowBark", "id": 42, "category": "pet store"},
67 : {"name": "Chef's Kiss", "id": 67, "category": "restaurant"},
}
I have to parse this in Pyspark and am using the following code:
stores = spark.read.json(stores, multiLine = True).cache()
This is not returning the desired dataframe, and is instead returning:
| key1 | key2 |
|----------------------------|--------------------------------|
|{MeowBark, true, pet store} |{Chef's Kiss, false, restuarant}|
I have tried using pd.read_json and parses it correctly once I transpose the dataframe, but I can’t use pd.read_json and need to only use spark’s transformations.
I tried defining the StructType but the challenge in this case is that ‘key1’ isn’t consistent because it refers to the row number.
Does anyone have an idea of what I’m doing wrong and what I should be doing differently? I’m totally at a loss here. Any help would be appreciated!
2
Answers
You can also try below code –
I tried using a parallelize collection. See if this code works for you.