I want to read a JSON file in PySpark, but the JSON file is in this format (without comma and square brackets):
{"id": 1, "name": "jhon"}
{"id": 2, "name": "bryan"}
{"id": 3, "name": "jane"}
Is there an easy way to read this JSON in PySpark?
I have already tried this code:
df= spark.read.option("multiline", "true").json("data.json")
df.write.parquet("data.parquet")
But it doesn’t work: in parquet file just the first line appears.
I just want to read this JSON file and save as parquet…
2
Answers
Try to read as a text file first, and parse it to a json object
Only the first line appears while reading data from your mentioned file because of
multiline
parameter is set asTrue
but in this case one line is a JSON object. So if you setmultiline
parameter asFalse
it will work as expected.In case if your JSON file would have had a JSON array in file like
or
multiline
parameter set toTrue
will work.