pyspark Questions

Azure – How can we encrypt specific column data of parquet using pyspark

August 14, 2023
Prayag15
2 Answers

There is a requirement in my project to encrypt some of PII columns data while writing data in a parquet file. To write the data in parquet file, Azure Synapse pyspark notebook is being used. Not getting any references on…

VIEW QUESTION

Trying to flatten JSON file but getting error using PySpark

August 11, 2023
Priya
2 Answers

I have a JSON file and I need to convert it into tabular form by using only Pyspark. My JSON file :- { "records": [ { "name": "Priya", "last_name": "Munjal", "special_values": [ { "name": "adress", "value": "some adress" }, {…

VIEW QUESTION

Need to flatten nested JSON file using PySpark

August 9, 2023
priyanka
2 Answers

I am new to Pyspark and trying to flatten JSON file using Pyspark but not getting desired output. Here is my JSON file :- { "events": [ { "event_name": "start", "event_properties": ["property1", "property2", "property3"], "entities": ["entityI", "entityII", "entityIII"], "event_timestamp": "2022-05-01…

VIEW QUESTION

How to Flatten JSON file using pyspark

August 8, 2023
Priya
2 Answers

I need to flatten JSON file so that I can get output in table format.Ihavetried but not getting the output that I want This is my JSON file :- { "records": [ { "name": "A", "last_name": "B", "special_values": [ {…

VIEW QUESTION

how to import anaconda pandas module in visual studio code environment?

August 7, 2023
Joseph Hwang
2 Answers

I configure the apache spark in visual studio code environment. The confiugration of settigs.json is like below, "python.defaultInterpreterPath": "C:\Anaconda3\python.exe", "terminal.integrated.env.windows": { "PYTHONPATH": "C:/spark-3.4.1-bin-hadoop3/python;C:/spark-3.4.1-bin-hadoop3/python/pyspark;C:/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip;C:/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip" }, "python.autoComplete.extraPaths": [ "C:\spark-3.4.1-bin-hadoop3\python", "C:\spark-3.4.1-bin-hadoop3\python\pyspark", "C:\spark-3.4.1-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip", "C:\spark-3.4.1-bin-hadoop3\python\lib\pyspark.zip" ], "python.analysis.extraPaths": [ "C:\spark-3.4.1-bin-hadoop3\python", "C:\spark-3.4.1-bin-hadoop3\python\pyspark", "C:\spark-3.4.1-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip", "C:\spark-3.4.1-bin-hadoop3\python\lib\pyspark.zip" ] But I…

VIEW QUESTION

Amazon web services – How to preserve the key letter case in AWS Glue Transform node?

August 2, 2023
Hongbo Miao
2 Answers

I am trying to add a new column called timestamp in AWS Glue. My upstream data keys have capital letters. However, after adding the column timestamp, the keys of the remaining columns got lowercased. Experiment 1: Transform - SQL Query…

VIEW QUESTION

In Azure databricks, How to create a table based on json config dynamically?

July 26, 2023
Surender Raja
2 Answers

I have this tableConfig.json inside a ADLS container based location . It has table specific details. { "tableName":"employee", "databaseName": "dbo", "location" : "/mnt/clean/demo", "colsList" : ["emp_id","emp_name","emp_city"] } Now I want to read that tableConfig.json in azure databricks python notebooks. and…

VIEW QUESTION

Reading a multiple line JSON with pyspark

July 20, 2023
Steuv
2 Answers

I can't manage to read a JSON file in Python with pyspark because it has multiple records with each variable on a different line. Exemple : { "id" : "id001", "name" : "NAME001", "firstname" : "FIRSTNAME001" } { "id" :…

VIEW QUESTION

Amazon web services – botocore.exceptions.NoRegionError: You must specify a region for EmrServerlessCreateApplicationOperator

June 27, 2023
Hasham
2 Answers

I am trying to create a emr-serverless application through the EmrServerlessCreateApplicationOperator but I keep facing the error botocore.exceptions.NoRegionError: You must specify a region. I am passing the region like below: create_app = EmrServerlessCreateApplicationOperator( task_id="create_spark_app", job_type="SPARK", release_label="emr-6.6.0", config={"aws_access_key_id":args["aws_access_key_id"], "aws_secret_access_key": args["aws_secret_access_key"], "aws_session_token":…

VIEW QUESTION

Amazon web services – AWS emr unable to install python library in bootstrap shell script

June 17, 2023
haneulkim
2 Answers

Using emr-5.33.1 and python3.7.16. Goal is to add petastorm==0.12.1 into EMR. These are the steps to install it in EMR (worked until now) Add all required dependencies of petastorm and itself into s3 folder copy paste all libraries from s3…

VIEW QUESTION