skip to Main Content

I have a json file just like below, and I need to read it and generate a table with the attributes of the person.

{
  "person":[
      [
      "name",
      "Guy"
      ],
      [
      "age",
      "25"
      ],
      [
       "height",
       "2.00"
      ]
  ]
}
name age height
Guy 25 2.00

What’s the easiest way and performatic way to read this json and output a table?

I’m thinking about converting the list as key-values pair, but since i’m working with loads of data it would be underperformatic.

And I’m having trouble exploding it because of other data in the dataframe.

2

Answers


  1. You can read do this using below command, specify mulitline=True

    your_df = spark.read.option("multiLine", "true").json(
        "yourjsonpath.json"
    )
    

    Also above question is answered before
    How to create a spark DataFrame from Nested JSON structure

    Login or Signup to reply.
  2. Try this:

    import pyspark.sql.functions as f
    
    # get the fields that are going to show up for person
    # './test_json.json' is the path for the json file.
    
    fields = (
        spark.read.option('multiLine', True).json('./test_json.json')
        .select(f.expr('transform(person, element -> element[0])').alias('fields'))
        .take(1)[0]['fields']
    )
    print(fields)
    
    df = (
        spark.read.option('multiLine', True).json('./test_json.json')
        .withColumn('json_string', f.concat(
                f.lit('{'),
                f.concat_ws(',', f.expr("""transform(person, element -> concat_ws(":", concat("'", element[0], "'"), concat("'", element[1], "'")))""")),
                f.lit('}')
            )
        )
        .withColumn('json_content', f.from_json(f.col('json_string'), StructType([StructField(element, StringType(), True) for element in fields])))
        .select('json_content.*')
    )
    df.show(truncate=False)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search