Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

extract multiple columns from a json string

DiwakarJha
December 6, 2022
253 views
0 votes
2 Answers

I have a JSON data that I want to represent in a tabular form and later write it to a different format (parquet)

Schema

root
|-- : string (nullable = true)

sample data

+----------------------------------------------+

+----------------------------------------------+
|{"deviceTypeId":"A2A","deviceId":"123","geo...|
|{"deviceTypeId":"A2B","deviceId":"456","geo...|
+----------------------------------------------+

Expected Output

+--------------+------------+
|  deviceTypeId|deviceId|...| 
+--------------+--------+---+
|           A2A|     123|   |
|           A2B|     456|   |
+--------------+--------+---+

I tried splitting the string, but this doesn’t seem like an efficient approach

split_col = split(df_explode[''], ',')

And then extract the columns, but it appends the initial string as well.

df_1 = df_explode.withColumn('deviceId',split_col.getItem(1))
# df_1 = df_explode.withColumn('deviceTypeId',split_col.getItem(0))
printOutput(df_1)

I’m looking for a better way to solve this problem

Tags: json pyspark

Answers

Chosen as BEST ANSWER
- DiwakarJha
- December 6, 2022 at 6:39 pm
- 0 votes
0
I was able to do it using the from_json function.
```
#Convert json column to multiple columns
schema = getSchema()
dfJSON = df_explode.withColumn("jsonData",from_json(col(''),schema)) 
                   .select("jsonData.*")
dfJSON.printSchema()
dfJSON.limit(100).toPandas()
```
We need to create Json Schema that will parse the Json data.
```
def getSchema():
    schema = StructType([ 
                StructField('deviceTypeId', StringType()),
                StructField('deviceId', StringType()),
                ...
                ])
    return schema
```
The value string is empty in this Json data so the col consists of empty string

(Edit)

- DineshKumar
- December 6, 2022 at 3:41 pm
- 0 votes
0
Explode function is only to work on Array.

In your case which is a json, you should use from_json function.

Please refer from_json from pyspark.sql.functions

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.