skip to Main Content

I have one stream output stored in csv file, I need help converting csv to json:

my csv looks like:

cat output.csv
"k","a1",1,"b1","c1","d1",1
"l","a2",2,"b2","c2","d2",2
"m","a3",3,"b3","c3","d3",3
"n","a4",4,"b4","c4","d4",4
"o","a5",5,"b5","c5","d5",5

Required output:

note: I need key configuration to be added to json.

{
    "configuration": {
      "k": {
        "a": "a1",
        "number1": "1",
        "c": "b1",
        "d": "c1",
        "e": "d1",
        "number2": "1"
      },
      "l": {
        "a": "a2",
        "number1": "2",
        "c": "b2",
        "d": "c2",
        "e": "d2",
        "number2": "2"
      },
      .
      .
      .
    }
}

So far tried with jq:

my function is:

cat api.jq

[
  inputs |
     split(",") |
     map(ltrimstr(""")) |
     map(rtrimstr(""")) |
      {
        a: .[1],
        number1: .[2],
        c: .[3],
        d: .[4],
        e: .[5],
        number2: .[6]
      }
] | {configuration: .}

Output:


jq -nRf api.jq output.csv

{
  "cluster_configuration": [
    {
      "a": "a1",
      "number1": "1",
      "c": "b1",
      "d": "c1",
      "e": "d1",
      "number2": "1"
    },
    {
      "a": "a2",
      "number1": "2",
      "c": "b2",
      "d": "c2",
      "e": "d2",
      "number2": "2"
    },
    {
      "a": "a3",
      "number1": "3",
      "c": "b3",
      "d": "c3",
      "e": "d3",
      "number2": "3"
    },
    {
      "a": "a4",
      "number1": "4",
      "c": "b4",
      "d": "c4",
      "e": "d4",
      "number2": "4"
    },
    {
      "a": "a5",
      "number1": "5",
      "c": "b5",
      "d": "c5",
      "e": "d5",
      "number2": "5"
    }
  ]
}

5

Answers


  1. jq’s from_entries can be used to generate objects with chosen keys.

    In the below example, we first use Miller to convert CSV to JSON more robustly (in a manner that supports values with commas or quotes) before proceeding with jq

    1. Add a header line at the top so Miller knows what key name to associate with each value:

      cat <(echo k,a,number1,c,d,e,number2) output.csv > output_with_header.csv
      
    2. Convert the csv to json with miller:

      mlr --icsv --ojson cat output_with_header.csv > output.json
      
    3. transform with jq, generating lists of maps with key and value elements and then combining them with from_entries:

      jq '{configuration: ([.[]|{key: .k,value: (.|del(.k))}]|from_entries)}' output.json
      

    This results in:

    {
      "configuration": {
        "k": {
          "a": "a1",
          "number1": 1,
          "c": "b1",
          "d": "c1",
          "e": "d1",
          "number2": 1
        },
        "l": {
          "a": "a2",
          "number1": 2,
          "c": "b2",
          "d": "c2",
          "e": "d2",
          "number2": 2
        },
        "m": {
          "a": "a3",
          "number1": 3,
          "c": "b3",
          "d": "c3",
          "e": "d3",
          "number2": 3
        },
        "n": {
          "a": "a4",
          "number1": 4,
          "c": "b4",
          "d": "c4",
          "e": "d4",
          "number2": 4
        },
        "o": {
          "a": "a5",
          "number1": 5,
          "c": "b5",
          "d": "c5",
          "e": "d5",
          "number2": 5
        }
      }
    }
    

    All together as a oneliner:

    cat <(echo k,a,number1,c,d,e,number2) output.csv | mlr --icsv --ojson cat | jq '{configuration: ([.[]|{key: .k,value: (.|del(.k))}]|from_entries)}'
    
    Login or Signup to reply.
  2. Insofar as your goal is to make a be a key, using from_entries is suitable for that:

    [
        inputs |
        split(",") |
        map(ltrimstr(""")) |
        map(rtrimstr(""")) |
        {
            "key": .[1],
            "value": {
                number: .[2],
                c: .[3],
                d: .[4],
                e: .[5],
                number: .[6]
            }
        }
    ] |
    from_entries |
    { configuration: . }
    

    When run with

    jq -R -f api.jq <output.csv
    

    …the output is:

    {
      "configuration": {
        "a2": {
          "number": "2",
          "c": "b2",
          "d": "c2",
          "e": "d2"
        },
        "a3": {
          "number": "3",
          "c": "b3",
          "d": "c3",
          "e": "d3"
        },
        "a4": {
          "number": "4",
          "c": "b4",
          "d": "c4",
          "e": "d4"
        },
        "a5": {
          "number": "5",
          "c": "b5",
          "d": "c5",
          "e": "d5"
        }
      }
    }
    
    Login or Signup to reply.
  3. If robustness of CSV parsing is a concern, you could easily adapt
    the parser at rosettacode.org. The following converts the CSV rows to JSON arrays; since the "main" program below uses inputs, you’d use the -R and -n command-line options.

    ## The PEG * operator:
    def star(E): (E | star(E)) // . ;
    
    ## Helper functions:
    
    # Consume a regular expression rooted at the start of .remainder, or emit empty;
    # on success, update .remainder and set .match but do NOT update .result
    def consume($re):
      # on failure, match yields empty
      (.remainder | match("^" + $re)) as $match
      | .remainder |= .[$match.length :]
      | .match = $match.string;
    
    def parse($re):
      consume($re)
      | .result = .result + [.match] ;
    
    def ws: consume(" *");
    
    ### Parse a string into comma-separated values
    
    def quoted_field_content:
      parse("(("")|([^"]))*")
      | .result[-1] |= gsub(""""; """);
    
    def unquoted_field: parse("[^,"]*");
    
    def quoted_field: consume(""") | quoted_field_content | consume(""");
    
    def field: (ws | quoted_field | ws) // unquoted_field;
    
    def record: field | star(consume(",") | field);
    
    def csv2array:
      {remainder: .} | record | .result;
    
    inputs | csv2array
    
    Login or Signup to reply.
  4. I know you raise this question as a bash+jq question, but, if it was a bash+python question, the solution would be trivial:

    # csv2json.py
    import sys, csv, json
    data = { "configuration": { } }
    for [k,a,n1,c,d,e,n2] in csv.reader(sys.stdin.readlines()):
        data["configuration"][k] = { "a": a, "number1": n1, "c": c, "d": d, "e": e, "number2": n2 }
    print(json.dumps(data, indent=2))
    

    Then, in bash (I’m assuming Ubuntu here), we could go:

    python3 csv2json.py < output.csv
    
    Login or Signup to reply.
  5. Here’s a possible solution with Miller (available here for several OSs), an interesting tool that supports multiple input/output formats:

    mlr --icsv -N put -q '
        @map[$1] = {"a": $2, "number1": $3, "c": $4, "d": $5, "e": $6, "number2": $7};
        end { dump { "configuration": @map } }
    ' file.csv
    
    {
      "configuration": {
        "k": {
          "a": "a1",
          "number1": 1,
          "c": "b1",
          "d": "c1",
          "e": "d1",
          "number2": 1
        },
        "l": {
    ...
    

    note: for forcing the numbers to be treated as strings you can use the --infer-none option.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search