skip to Main Content

I have a df_movies and col of geners that look like json format.

|genres |
[{‘id’: 28, ‘name’: ‘Action’}, {‘id’: 12, ‘name’: ‘Adventure’}, {‘id’: 37, ‘name’: ‘Western’}]

How can I extract the first field of ‘name’: val?

way #1

df_movies.withColumn
    ("genres_extract",regexp_extract(col("genres"),
    """ 'name': (w+)""",1)).show(false)

way #2

df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':sd,s 'name':s(w+)""",1))

Excepted: Action

2

Answers


  1. You can use get_json_object function:

      Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
        .toDF("genres")
        .withColumn("genres_extract", get_json_object(col("genres"), "$[0].name" ))
        .show()
    
    
    +--------------------+--------------+
    |              genres|genres_extract|
    +--------------------+--------------+
    |[{"id": 28, "name...|        Action|
    +--------------------+--------------+
    
    Login or Signup to reply.
  2. Another possibility is using the from_json function together with a self defined schema. This allows you to "unwrap" the json structure into a dataframe with all of the data in there, so that you can use it however you want!

    Something like the following:

    import org.apache.spark.sql.types._
    
    Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
      .toDF("genres")
    
    
    // Creating the necessary schema for the from_json function
    val moviesSchema = ArrayType(
      new StructType()
        .add("id", StringType)
        .add("name", StringType)
      )
    
    // Parsing the json string into our schema, exploding the column to make one row
    // per json object in the array and then selecting the wanted columns,
    // unwrapping the parsedActions column into separate columns
    val parsedDf = df
      .withColumn("parsedMovies", explode(from_json(col("genres"), moviesSchema)))
      .select("parsedMovies.*")
    
    parsedDf.show(false)
    +---+---------+                                                                                                                                                                                                                                                                 
    | id|     name|                                                                                                                                                                                                                                                                 
    +---+---------+                                                                                                                                                                                                                                                                 
    | 28|   Action|                                                                                                                                                                                                                                                                 
    | 12|Adventure|                                                                                                                                                                                                                                                                 
    | 37|  Western|                                                                                                                                                                                                                                                                 
    +---+---------+
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search