I have a csv file imported from a sql database, one of the column is made of JSON data. I used to be able to read_csv the file and then parse the JSON data using fromJSON.
However, now when I read the file the double quotations marks around the JSON fields are automatically removed and I can’t find a way to prevent that from happening.
Here is an example. CSV file looks like this (3 columns, 4 rows):
A,B,C
1,2,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
3,4,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
5,6,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
7,8,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
json_data<-read_csv(file="test.csv")
produces
> json_data
# A tibble: 4 × 3
A B C
<dbl> <dbl> <chr>
1 2 [{C1:C6.bmp,C2: C5.bmp}]
3 4 [{C1:C6.bmp,C2: C5.bmp}]
5 6 [{C1:C6.bmp,C2: C5.bmp}]
7 8 [{C1:C6.bmp,C2: C5.bmp}]
instead of the expected:
> json_data
# A tibble: 4 × 3
A B C
<dbl> <dbl> <chr>
1 2 [{"C1":"C6.bmp","C2": "C5.bmp"}]
3 4 [{"C1":"C6.bmp","C2": "C5.bmp"}]
5 6 [{"C1":"C6.bmp","C2": "C5.bmp"}]
7 8 [{"C1":"C6.bmp","C2": "C5.bmp"}]
Without quotation marks column C cannot be passed on to fromJSON…
2
Answers
Using
readLines()
, thengsub
commas between word characters to something e.g. an asterisk andstrsplit
at it.If you can control how that file is created, you could get it to use single quotes around the JSON strings, i.e.
and then
read.csv
orread_csv
will read it properly as long as you specifyquote = "'"
:If you can’t control that, you might be able to edit the file to look like that; in your sample, the only place
"[
appears is at the start of the JSON, and the only place]"
appears is at the end. If this is also true for your real data, then you could make that edit pretty easily.If that’s not feasible, then you could read it with
quote = ""
, which will leave all the quotes in place. The trouble with this is that the comma in the JSON will be taken to be a column separator, so you’ll have a lot of fixups to do afterwards.