skip to Main Content

I have a csv file imported from a sql database, one of the column is made of JSON data. I used to be able to read_csv the file and then parse the JSON data using fromJSON.
However, now when I read the file the double quotations marks around the JSON fields are automatically removed and I can’t find a way to prevent that from happening.

Here is an example. CSV file looks like this (3 columns, 4 rows):

A,B,C
1,2,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
3,4,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
5,6,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
7,8,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"

json_data<-read_csv(file="test.csv") produces

> json_data
# A tibble: 4 × 3
  A     B C                       
 <dbl> <dbl> <chr>                   
  1     2 [{C1:C6.bmp,C2: C5.bmp}]
  3     4 [{C1:C6.bmp,C2: C5.bmp}]
  5     6 [{C1:C6.bmp,C2: C5.bmp}]
  7     8 [{C1:C6.bmp,C2: C5.bmp}]

instead of the expected:

> json_data
# A tibble: 4 × 3
  A     B C                       
 <dbl> <dbl> <chr>                   
  1     2 [{"C1":"C6.bmp","C2": "C5.bmp"}]
  3     4 [{"C1":"C6.bmp","C2": "C5.bmp"}]
  5     6 [{"C1":"C6.bmp","C2": "C5.bmp"}]
  7     8 [{"C1":"C6.bmp","C2": "C5.bmp"}]

Without quotation marks column C cannot be passed on to fromJSON…

2

Answers


  1. Using readLines(), then gsub commas between word characters to something e.g. an asterisk and strsplit at it.

    > s <- gsub(readLines("foo.csv"), pat="(?<=\w),", rep="*", perl=TRUE) |> 
    +   strsplit('\*')
    > do.call('rbind.data.frame', s[-1]) |> setNames(s[[1]])
      A B                                  C
    1 1 2 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
    2 3 4 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
    3 5 6 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
    4 7 8 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
    
    Login or Signup to reply.
  2. If you can control how that file is created, you could get it to use single quotes around the JSON strings, i.e.

    A,B,C
    1,2,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
    3,4,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
    5,6,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
    7,8,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
    

    and then read.csv or read_csv will read it properly as long as you specify quote = "'":

    > read.csv("~/temp/test.csv", quote="'")
      A B                                C
    1 1 2 [{"C1":"C6.bmp","C2": "C5.bmp"}]
    2 3 4 [{"C1":"C6.bmp","C2": "C5.bmp"}]
    3 5 6 [{"C1":"C6.bmp","C2": "C5.bmp"}]
    4 7 8 [{"C1":"C6.bmp","C2": "C5.bmp"}]
    

    If you can’t control that, you might be able to edit the file to look like that; in your sample, the only place "[ appears is at the start of the JSON, and the only place ]" appears is at the end. If this is also true for your real data, then you could make that edit pretty easily.

    If that’s not feasible, then you could read it with quote = "", which will leave all the quotes in place. The trouble with this is that the comma in the JSON will be taken to be a column separator, so you’ll have a lot of fixups to do afterwards.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search