skip to Main Content

Read Json in Pyspark

I want to read a JSON file in PySpark, but the JSON file is in this format (without comma and square brackets): {"id": 1, "name": "jhon"} {"id": 2, "name": "bryan"} {"id": 3, "name": "jane"} Is there an easy way to…

VIEW QUESTION

Pyspark – Flatten nested json

I have a json that looks like this: [ { "event_date": "20221207", "user_properties": [ { "key": "user_id", "value": { "set_timestamp_micros": "1670450329209558" } }, { "key": "doc_id", "value": { "set_timestamp_micros": "1670450329209558" } } ] }, { "event_date": "20221208", "user_properties": [ {…

VIEW QUESTION

extract multiple columns from a json string

I have a JSON data that I want to represent in a tabular form and later write it to a different format (parquet) Schema root |-- : string (nullable = true) sample data +----------------------------------------------+ +----------------------------------------------+ |{"deviceTypeId":"A2A","deviceId":"123","geo...| |{"deviceTypeId":"A2B","deviceId":"456","geo...| +----------------------------------------------+ Expected Output…

VIEW QUESTION

Azure – Querying and Inserting records from SQL Server using Python

We are porting some code from SSIS to Python. As part of this project, I'm recreating some packages but I'm having issues with the database access. I've managed to query the DB like this: employees_table = (spark.read .format("jdbc") .option("url", "jdbc:sqlserver://dev.database.windows.net:1433;database=Employees;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;")…

VIEW QUESTION
Back To Top
Search