Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Converting pyspark.sql.Rowtype data to Json string eliminating values in Azure Databricks NB

SaswatRay
October 9, 2023
265 views
1 vote
2 Answers

I have below pyspark row type data:

indv_msg = [Row(cbm_json_output=Row(country_code='USA', date='06-10-2023', date_epoch='1696550400', id='USA-001535-1696550400', interfaceVersion='1.0.0', opmode_car_door=Row(health_category='GREEN', msg_id='1', num_yellow_preds_in_last_14_days=0, reason=None, reasonDetail=None), opmode_landing_door=Row(health_category='GREEN', msg_id='1', reason=None, reasonDetail=None), sensor=Row(component_type=None, health_category=None, landing_priority=None, msg_id='1', num_yellow_preds_in_last_14_days=None, reason=None, reasonDetail=None), unit_id='001535'))

While trying to convert to json string, it is eliminating assign field such as "country_code", "date", ….

user_encode_data = json.dumps(indv_msg, indent=2)

result :
indv_msg

 [
  [
    "USA",
    "06-10-2023",
    "1696550400",
    "USA-001535-1696550400",
    "1.0.0",
    [
      "GREEN",
      "1",
      0,
      null,
      null
    ],
    [
      "GREEN",
      "1",
      null,
      null
    ]
]

Expected result:
indv_msg

[
  [
    "country_code" : "USA",
    "date" : "06-10-2023",
    "date_epoch": 1696550400",
     ....
     ....
   ]
 ]

Answers

Define a function (to_dict) to recursively convert pyspark Row type to the corresponding dictionary representation then map this function on each row inside indv_msg finally use json.dumps to serialize the data

def to_dict(row):
    return {
        k: to_dict(v) if isinstance(v, Row) else v
        for k, v in row.asDict().items()
    }

data = json.dumps([to_dict(msg) for msg in indv_msg], indent=4)

print(data)
[
    {
        "cbm_json_output": {
            "country_code": "USA",
            "date": "06-10-2023",
            "date_epoch": "1696550400",
            "id": "USA-001535-1696550400",
            "interfaceVersion": "1.0.0",
            "opmode_car_door": {
                "health_category": "GREEN",
                "msg_id": "1",
                "num_yellow_preds_in_last_14_days": 0,
                "reason": null,
                "reasonDetail": null
            },
            "opmode_landing_door": {
                "health_category": "GREEN",
                "msg_id": "1",
                "reason": null,
                "reasonDetail": null
            },
            "sensor": {
                "component_type": null,
                "health_category": null,
                "landing_priority": null,
...
            "unit_id": "001535"
        }
    }
]

- JayashankarGS
- October 9, 2023 at 11:20 am
- 0 votes
0
Addition to @Shubham Sharma you can just call row.asDict(True) with recursive as True.

When i tried in my environment even i got the same results.

So you can use below code block to get json output.
```
indv_msg_dict = [row.asDict(True) for row in indv_msg]

user_encode_data = json.dumps(indv_msg_dict, indent=2)

print(user_encode_data)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.