Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – How to separate an array from a command using quotes instead of spaces in bash

JonahAlexander
March 26, 2024
214 views
0 votes
2 Answers

I am currently writing a Bash script to search through a JSONL file of Mario Maker level data (found in this Reddit post https://www.reddit.com/r/MarioMaker/comments/wlkwp9/easily_searchable_database_of_super_mario_maker_1/) and print the level id, name of the course, and name of the creator. I am however running into a problem when trying to print all this found information out, as the arrays I’m getting are separated using spaces in the course names instead of new lines or quotes. Below is my current code, the Python script used to convert the id’s to in-game id’s, and a single entry in the JSONL file (it’s over 16gb of raw text)

Bash script:

#!/bin/bash
ids=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $1}' | awk -F ':' '{print $2}')
cns=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $5}' | awk -F ':' '{print $2}')
crs=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $7}' | awk -F ':' '{print $2}')
for id in $ids; do
        idn=$(python3 /path/to/file/mmid.py $id)
        ida+=($idn)
done

for i in "${!ida[@]}"; do
        printf "ID: %sn" "${ida[i]}"
        printf "Course Name: %sn" "${cns[i]}"
        printf "Creator Name: %sn" "${crs[i]}"
        printf "n"
done

Python Script (code mostly taken from the above Reddit thread)

import struct
import hashlib
import hmac
import sys
idno = int(sys.argv[1])
key = hashlib.md5(b"9f2b4678").digest()
data = struct.pack("<Q", idno)
checksum = hmac.HMAC(key, data, 'md5').digest()
checksum = checksum[3:1:-1].hex().upper()
idstring = str.upper(hex(idno))[2:]
for y in range(8 - len(idstring)):
    idstring = '0' + str(idstring)
code = str(checksum) + '0000' + str(idstring)
print(code)

Example Entry from courses.jsonl

{"id":66782542,"retrieval_date":"2021-06-02T02:40:10.837452Z","url":"https://d2sno3mhmk1ekx.cloudfront.net/10.WUP_AMAJ_datastore/ds/1/data/00066782542-00001?Expires=1625193610&Signature=vh50sqvoN2u-Xu~uT2pYJMNzj1kV11NDt77BDC2UM5o9VtTz-3HNxheiWc~PxDJhCqDNL-M7u9qrFnQ6FkOxDTyzo3QrK1VcNDqoAFRKA2RL03au-FdN9daY4~CDeKS3TvEkpzqBGe9fZfwlz6S-z7~VwRLsjPbw26QSkeszZOdGNT75RWyx2jeqKYpcjzi4tagbiWwq0DzbHzaXjlIpYpTsHIPaemS0fpad0d-Hgv56R-c3BCt2rzCoxco~jpVO2FW2HZoQRJIPU0mwbZE0wLlpKEHMwEcOoUbRIbYP~5U4XFwc6eahHc19GRjttzNDHBm68u9yhi3BVjlaMnRm2g__&Key-Pair-Id=APKAJUYKVK3BE6ZPNZBQ","stars":1,"course_name":"go","creator":{"pid":1753458969,"nnid":"Khanna1974","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAQLAAeGTApABA1vjUsH+qXdiADAAAcxVNAGEAcgBpAAAATQBFAAAAAAAAAEBAAACDAUJoRBggNEYUgRIBaA0AACkAUiVBTQBhAHIAaQAAAAAAAAAAAAAAAAAAAI1HAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"upload_time":"2019-08-19T19:53:26Z","user_plays":17,"clears":14,"total_attempts":27,"failures":13,"world_record":{"best_time_player":{"pid":1744579894,"nnid":"HifysWU","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAAAAQMlEqo2ExqEA13Ajt2LjAh1dDQAAACBSAGEAdgBnAHoAAABFAAAAAAAAAH8oJgAiAIhGgxgTEoYUaxATZG0AACWCUUhQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCXAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"first_complete_player":{"pid":1744078322,"nnid":"hellodarknessyee","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAAAAQPB5NCID5WIC2Pb9Wi0zXBWf6gAAARRSAG8AcwBlAAAATQBFAAAAAAAAAEBAMgCBBaBoYxKzMoUOoQwTZgwAKCG4OUhQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGK9AAAAAAAAAAAAAAAAAAAAAAAAAAE="},"best_time_ms":5129,"created_time":"2019-08-19T20:02:41Z","updated_time":"2019-08-20T18:41:32Z"}}

So far I have tried using a loop and appending the data to a new array, and using eval to do basically the same thing.

Answers

- GillesQu233not
- March 23, 2024 at 11:52 pm
- 0 votes
0
What I would do using jq:
```
jq -r '"ID: (.id)
Course name: (.course_name)
Creator name: (.creator.nnid)"' file.json
```
Yields:
```
ID: 66782542
Course name: go
Creator name: Khanna1974
```
Login or Signup to reply.

Let’s take the JSON and Python out of this as your question is about bash arrays and so those are just muddying things as you wouldn’t have bash arrays involved in processing JSON with python.

So your current script basically looks like:

$ cat tst.sh
#!/usr/bin/env bash

ids=$(printf '%sn' 123 97 512)
cns=$(printf '%sn' "This Is First" "Second Here" "And Third")
crs=$(printf '%sn' "Joe Shmo" "Sue Me" "Perry The Platypus")

for id in $ids; do
        ida+=($id)
done

for i in "${!ida[@]}"; do
        printf 'i: <%s>n' "$i"
        printf 'ID: <%s>n' "${ida[i]}"
        printf 'Course Name: <%s>n' "${cns[i]}"
        printf 'Creator Name: <%s>n' "${crs[i]}"
        printf "n"
done

and when executed it’ll output:

$ ./tst.sh
i: <0>
ID: <123>
Course Name: <This Is First
Second Here
And Third>
Creator Name: <Joe Shmo
Sue Me
Perry The Platypus>

i: <1>
ID: <97>
Course Name: <>
Creator Name: <>

i: <2>
ID: <512>
Course Name: <>
Creator Name: <>

when what you want output is:

i: <0>
ID: <123>
Course Name: <This Is First>
Creator Name: <Joe Shmo>

i: <1>
ID: <97>
Course Name: <Second Here>
Creator Name: <Sue Me>

i: <2>
ID: <512>
Course Name: <And Third>
Creator Name: <Perry The Platypus>

That’s because you’re trying to access the contents of cns and crs by indexing them with the index from ida[], e.g. "${cns[i]}", but neither cns nor crs are arrays, they’re strings, and so cns[0] is interpreted as just cns and so prints the whole cns string while cns[any other index] prints null. If you want them to be arrays then you could do:

$ cat tst.sh
#!/usr/bin/env bash

readarray -t ids < <(printf '%sn' 123 97 512)
readarray -t cns < <(printf '%sn' "This Is First" "Second Here" "And Third")
readarray -t crs < <(printf '%sn' "Joe Shmo" "Sue Me" "Perry The Platypus")

for i in "${!ids[@]}"; do
        printf 'i: <%s>n' "$i"
        printf 'ID: <%s>n' "${ids[i]}"
        printf 'Course Name: <%s>n' "${cns[i]}"
        printf 'Creator Name: <%s>n' "${crs[i]}"
        printf "n"
done

$ ./tst.sh
i: <0>
ID: <123>
Course Name: <This Is First>
Creator Name: <Joe Shmo>

i: <1>
ID: <97>
Course Name: <Second Here>
Creator Name: <Sue Me>

i: <2>
ID: <512>
Course Name: <And Third>
Creator Name: <Perry The Platypus>

Now let’s apply that to your real starting point which is you have some input text that you want to select some parts of for further processing, e.g. let’s say you have CSV like:

$ cat input.csv
123,foo,This Is First,bar,Joe Shmo,other stuff
50,foo,To be Ignored,bar,Griffin,more stuff
97,foo,Second Here,bar,Sue Me,bad stuff
17,foo,Also Ignored,bar,The Stranger,etc.
512,foo,And Third,bar,Perry The Platypus,Blah blah

and want to process 3 fields from every line that doesn’t contain the string Ignore. Again ignoring the fact you wouldn’t really use a bash script for this and we’re just demonstrating how to use bash arrays, that’d look like:

$ cat tst.sh
#!/usr/bin/env bash

infile='input.csv'

readarray -t ids < <(awk -F',' '!/Ignore/{print $1}' "$infile")
readarray -t cns < <(awk -F',' '!/Ignore/{print $3}' "$infile")
readarray -t crs < <(awk -F',' '!/Ignore/{print $5}' "$infile")

for i in "${!ids[@]}"; do
        printf 'i: <%s>n' "$i"
        printf 'ID: <%s>n' "${ids[i]}"
        printf 'Course Name: <%s>n' "${cns[i]}"
        printf 'Creator Name: <%s>n' "${crs[i]}"
        printf "n"
done

$ ./tst.sh
i: <0>
ID: <123>
Course Name: <This Is First>
Creator Name: <Joe Shmo>

i: <1>
ID: <97>
Course Name: <Second Here>
Creator Name: <Sue Me>

i: <2>
ID: <512>
Course Name: <And Third>
Creator Name: <Perry The Platypus>

but calling awk 3 times to do the same search of your input file 3 times printing 1 field at a time is wasteful and makes it hard to maintain your script if your search criteria changes so you’d really want to do the search once to get all the data and then use it as you like, e.g.:

$ cat tst.sh
#!/usr/bin/env bash

infile='input.csv'

readarray -t lines < <(
    awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
)

readarray -t ids < <(printf '%sn' "${lines[@]}" | cut -d, -f1)
readarray -t cns < <(printf '%sn' "${lines[@]}" | cut -d, -f2)
readarray -t crs < <(printf '%sn' "${lines[@]}" | cut -d, -f3)

for i in "${!ids[@]}"; do
        printf 'i: <%s>n' "$i"
        printf 'ID: <%s>n' "${ids[i]}"
        printf 'Course Name: <%s>n' "${cns[i]}"
        printf 'Creator Name: <%s>n' "${crs[i]}"
        printf "n"
done

or less efficiently:

$ cat tst.sh
#!/usr/bin/env bash

infile='input.csv'

readarray -t lines < <(
    awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
)

while IFS=',' read -r id cn cr; do
    ids+=( "$id" )
    cns+=( "$cn" )
    crs+=( "$cr" )
done < <(printf '%sn' "${lines[@]}")

for i in "${!ids[@]}"; do
        printf 'i: <%s>n' "$i"
        printf 'ID: <%s>n' "${ids[i]}"
        printf 'Course Name: <%s>n' "${cns[i]}"
        printf 'Creator Name: <%s>n' "${crs[i]}"
        printf "n"
done

but of course if you just needed to do something in bash with the id, cn, and cr values such as call some other tool with them as arguments/input (represented by a function prtVals() below), you could actually just do either of these instead:

$ cat tst.sh
#!/usr/bin/env bash

infile='input.csv'

prtVals() {
        printf 'ID: <%s>n' "$1"
        printf 'Course Name: <%s>n' "$2"
        printf 'Creator Name: <%s>n' "$3"
        printf "n"
}

while IFS=',' read -r id cn cr; do
        prtVals "$id" "$cn" "$cr"
done < <(
    awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
)

or:

$ cat tst.sh
#!/usr/bin/env bash

infile='input.csv'

prtVals() {
        printf 'ID: <%s>n' "$1"
        printf 'Course Name: <%s>n' "$2"
        printf 'Creator Name: <%s>n' "$3"
        printf "n"
}

while IFS=',' read -r -a line; do
        prtVals "${line[@]}"
done < <(
    awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
)

or:

$ cat tst.sh
#!/usr/bin/env bash

infile='input.csv'

prtVals() {
        printf 'ID: <%s>n' "$1"
        printf 'Course Name: <%s>n' "$2"
        printf 'Creator Name: <%s>n' "$3"
        printf "n"
}

export -f prtVals

awk 'BEGIN{FS=OFS=ORS=","} !/Ignore/{print $1, $3, $5}' "$infile" |
xargs -d ',' -n 3 bash -c 'prtVals "$@"' _

That last one requires GNU xargs for -d.

Please signup or login to give your own answer.

Click here to cancel reply.