I have a df_movies and col of geners that look like json format.
|genres |
[{‘id’: 28, ‘name’: ‘Action’}, {‘id’: 12, ‘name’: ‘Adventure’}, {‘id’: 37, ‘name’: ‘Western’}]
How can I extract the first field of ‘name’: val?
way #1
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
""" 'name': (w+)""",1)).show(false)
way #2
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':sd,s 'name':s(w+)""",1))
Excepted: Action
2
Answers
You can use get_json_object function:
Another possibility is using the from_json function together with a self defined schema. This allows you to "unwrap" the json structure into a dataframe with all of the data in there, so that you can use it however you want!
Something like the following: