I have a dataframe that looks like this,
+-------------------------------------------+
| output|
+-------------------------------------------+
|{"COLUMN1": "123", "COUMN2": {"A":1 "B":2}}|
+-------------------------------------------+
And i just want to read the json as a string or dictionary in a variable so that i could do further manipulations on it.
Problems are –
- Apparently when you use unity catalogue on databricks you are not allowed to use rdd’s or methods like .iterrows,.collect,etc. (ref – https://community.databricks.com/s/question/0D58Y00009yKdeHSAS/cannot-use-rdd-and-cannot-set-sparkdatabrickspysparkenablepy4jsecurity-false-for-cluster)
- And using something like .asDict or .first() is converting into Rows datatype and am not able to convert it back into json.
EG.
Row(output=Row(COLUMN1='123', ...
How was the df created ?
nextdf = df.select(struct(col("COLUMN1"),col("COLUMN2"),col("COLUMN3")).alias("output"))
OUTPUT SHOULD BE –
{"COLUMN1": "123", "COUMN2": {"A":1 "B":2}}
Please let me know what can i try ?
2
Answers
You can use
toJSON()
for this case.Example:
To get things without collecting then use
Save as Text:
Use
save
astext
file withheader
flagfalse
to escape the column name from the output file.RDD
is the old interface,DataFrame
is the new/replacement interface. UseDataFrame
methods.