Suppose I have two columns (StringType):
JSON 1 | JSON 2 |
---|---|
{"key1":"value1"} | {"key2":"value2"} |
{"key3":"value3"} | {"key4":"value4"} |
I need to concat these columns to have a LIST of JSON in the third column (also string), something like this:
JSON 1 | JSON 2 | JSON 3 |
---|---|---|
{"key1":"value1"} | {"key2":"value2"} | [{"key1":"value1"},{"key2":"value2"}] |
{"key3":"value3"} | {"key4":"value4"} | [{"key3":"value3"},{"key4":"value4"}] |
Actually I solved the problem, but I don’t think it’s the best approach:
from pyspark.sql import DataFrame
from pyspark.sql.functions import col, concat, concat_ws, lit
def concat_jsons(df: DataFrame, columns: list):
df = df.withColumn(
'JSON 3', concat_ws(',', *columns)
)
return df.withColumn(
'JSON 3',
concat(lit('['), col('JSON 3'), lit(']'))
)
Anyone has a better idea?
2
Answers
Your solution looks fine, though I would combine the two statements in your function:
if you want "JSON 3" to be for StringType, cast it to string