How to read several JSON files with different column count into one Dataframe in Spark - PhpOut

Rinze
February 1, 2024
139 views
0 votes
2 Answers

I have 2 JSON files, one is like:

{
  "a":{
    "a1":"xxx"
  },
  "b":"xxx"
}

Another one is like:

{
  "a":{
    "a1":"xxx",
    "a2":"xxx"
  },
  "b":"xxx"
}

And I want to read these two JSON files into one Dataframe in Spark. I tried to use union and unionByName but they didn’t work. How can I achieve this?

Tags: apache-spark json pyspark

Answers

- NeftaliRamos
- February 1, 2024 at 8:54 am
- 0 votes
0
If you’ve got a couple of JSON files with different columns and wanna smoosh them into one DataFrame in Spark, you’re in luck. Spark’s got this cool feature that lets you merge schemas on the fly. Just use the mergeSchema option when you read your JSONs. Here’s a quick way to do it in PySpark:
```
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Cool JSON Merge").getOrCreate()

df = spark.read.option("mergeSchema", "true").json("path/to/your/json/files/*")

df.show()
```
Just swap in your actual file path. This tells Spark to chill and combine those columns together, even if some JSONs have extra fields.

A heads-up: turning on mergeSchema might slow things down a bit since Spark has to read through the files an extra time to figure out the schema. But hey, if you need everything in one place, it’s totally worth it.

Hope that helps you out!
Login or Signup to reply.

- WaqarAhmed
- February 1, 2024 at 12:06 pm
- 0 votes
0
Spark can take care of merging the schema. See the following code:
```
>>> spark.read.option("multiLine", True).json("test-jsons/*").printSchema()
root
 |-- a: struct (nullable = true)
 |    |-- a1: string (nullable = true)
 |    |-- a2: string (nullable = true)
 |-- b: string (nullable = true)

>>> spark.read.option("multiLine", True).json("test-jsons/*").show()
+-----------+---+
|          a|  b|
+-----------+---+
| {xxx, xxx}|xxx|
|{xxx, NULL}|xxx|
+-----------+---+
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.