pyspark Questions

Explode a json string present in pyspark dataframe

April 25, 2023
Ben
2 Answers

I have a JSON string substitutions as a column in dataframe which has multiple array elements that I want to explode and create a new row for each element present in that array. There are other columns present in the…

VIEW QUESTION

Flattening nested JSON file into a PySpark DF

April 22, 2023
lucija
2 Answers

I am new to PySpark and I am struggling with flattening a nested json file into a PySpark data frame. I need to define the schema for the JSON data. I know how to define schema for regular json files…

VIEW QUESTION

How to use filter condition on multiple columns with not condition – Amazon Web Sevices

April 14, 2023
Pandey
2 Answers

My Data set looks like this : I am using this filter : df = df.filter(trim(col("AGE"))!="" & trim(col("PHONE"))!="") I am getting an empty dataframe, I want the data without the record having name =G3 Any help is appreciated.

VIEW QUESTION

Azure – How to extract Sheet names from Excel file using "com.crealytics.spark.excel" in Databricks (PySpark)

April 5, 2023
AzSurya Teja
2 Answers

I have an Excel file in the azure datalake ,I have read the excel file like the following ddff=spark.read.format("com.crealytics.spark.excel") .option("header", "true") .option("sheetName","__all__") .option("inferSchema","true") .load("abfss://[email protected]/file.xlsx") Now Iam confused how to get just the sheetnames from that Excel file,is there any direct…

VIEW QUESTION

PySpark Separate into columns nested json from 'Kafka value' for Spark structured streaming

March 28, 2023
kafayat.a.adeoye
2 Answers

I have been able to write to console the json file I want to work on to console. Please, how do I separate the 'value' column into columns of data as in the json and write to delta lake for…

VIEW QUESTION

How to split a string to an array of filtered integers? – Amazon Web Sevices

March 24, 2023
1131
2 Answers

I have a DF column which is a long strings with comma separated values, like: 2000,2001,2002:a,2003=b,2004,100,101,500,20 101,102,20 What I want to do is to create a new Array<Int> column out of it where: only values starting with 2 are included…

VIEW QUESTION

Split .csv file column in 2 in Azure Synapse Analytics using PySpark

March 20, 2023
Eli
2 Answers

I have a .csv file (in Azure Data Lake Storage), which looks approximately like this -> I want to create a notebook (PySpark (Python)), which could be implemented in the synapse analytics (integrate -> pipeline) in one of the pipelines.…

VIEW QUESTION

Read a nested json string and explode into multiple columns in pyspark

March 16, 2023
Gingerbread
2 Answers

I want to parse a JSON request and create multiple columns out of it in pyspark as follows: { "ID": "abc123", "device": "mobile", "Ads": [ { "placement": "topright", "Adlist": [ { "name": "ad1", "subtype": "placeholder1", "category": "socialmedia", }, { "name":…

VIEW QUESTION

selectively explode lists from Dataframe – Amazon Web Sevices

March 16, 2023
1131
2 Answers

What is the best way to explode a comma separated column based on specific conditions? I have some data in the following format: ID col1 1 a100,a101,b100,c100 2 a105,b100 3 b101, c104 what I want to achieve is to: grab…

VIEW QUESTION

Parse additional fields Struc from JSON into separate columns in Pyspark

March 3, 2023
ank1801
2 Answers

I have a JSON file with a field named "AdditionalFields" as below- "additionalFields": [ { "fieldName":"customer_name", "fieldValue":"ABC" }, { "fieldName":"deviceid", "fieldValue":"1234" }, { "fieldName":"txn_id", "fieldValue":"2" }, { "fieldName":"txn_date", "fieldValue":"2017-08-14T18:17:37" }, { "fieldName":"orderid", "fieldValue":"I126101" } ] How to parse this as…

VIEW QUESTION