skip to Main Content

I am currently writing a Bash script to search through a JSONL file of Mario Maker level data (found in this Reddit post https://www.reddit.com/r/MarioMaker/comments/wlkwp9/easily_searchable_database_of_super_mario_maker_1/) and print the level id, name of the course, and name of the creator. I am however running into a problem when trying to print all this found information out, as the arrays I’m getting are separated using spaces in the course names instead of new lines or quotes. Below is my current code, the Python script used to convert the id’s to in-game id’s, and a single entry in the JSONL file (it’s over 16gb of raw text)

Bash script:

#!/bin/bash
ids=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $1}' | awk -F ':' '{print $2}')
cns=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $5}' | awk -F ':' '{print $2}')
crs=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $7}' | awk -F ':' '{print $2}')
for id in $ids; do
        idn=$(python3 /path/to/file/mmid.py $id)
        ida+=($idn)
done

for i in "${!ida[@]}"; do
        printf "ID: %sn" "${ida[i]}"
        printf "Course Name: %sn" "${cns[i]}"
        printf "Creator Name: %sn" "${crs[i]}"
        printf "n"
done

Python Script (code mostly taken from the above Reddit thread)

import struct
import hashlib
import hmac
import sys
idno = int(sys.argv[1])
key = hashlib.md5(b"9f2b4678").digest()
data = struct.pack("<Q", idno)
checksum = hmac.HMAC(key, data, 'md5').digest()
checksum = checksum[3:1:-1].hex().upper()
idstring = str.upper(hex(idno))[2:]
for y in range(8 - len(idstring)):
    idstring = '0' + str(idstring)
code = str(checksum) + '0000' + str(idstring)
print(code)

Example Entry from courses.jsonl

{"id":66782542,"retrieval_date":"2021-06-02T02:40:10.837452Z","url":"https://d2sno3mhmk1ekx.cloudfront.net/10.WUP_AMAJ_datastore/ds/1/data/00066782542-00001?Expires=1625193610&Signature=vh50sqvoN2u-Xu~uT2pYJMNzj1kV11NDt77BDC2UM5o9VtTz-3HNxheiWc~PxDJhCqDNL-M7u9qrFnQ6FkOxDTyzo3QrK1VcNDqoAFRKA2RL03au-FdN9daY4~CDeKS3TvEkpzqBGe9fZfwlz6S-z7~VwRLsjPbw26QSkeszZOdGNT75RWyx2jeqKYpcjzi4tagbiWwq0DzbHzaXjlIpYpTsHIPaemS0fpad0d-Hgv56R-c3BCt2rzCoxco~jpVO2FW2HZoQRJIPU0mwbZE0wLlpKEHMwEcOoUbRIbYP~5U4XFwc6eahHc19GRjttzNDHBm68u9yhi3BVjlaMnRm2g__&Key-Pair-Id=APKAJUYKVK3BE6ZPNZBQ","stars":1,"course_name":"go","creator":{"pid":1753458969,"nnid":"Khanna1974","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAQLAAeGTApABA1vjUsH+qXdiADAAAcxVNAGEAcgBpAAAATQBFAAAAAAAAAEBAAACDAUJoRBggNEYUgRIBaA0AACkAUiVBTQBhAHIAaQAAAAAAAAAAAAAAAAAAAI1HAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"upload_time":"2019-08-19T19:53:26Z","user_plays":17,"clears":14,"total_attempts":27,"failures":13,"world_record":{"best_time_player":{"pid":1744579894,"nnid":"HifysWU","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAAAAQMlEqo2ExqEA13Ajt2LjAh1dDQAAACBSAGEAdgBnAHoAAABFAAAAAAAAAH8oJgAiAIhGgxgTEoYUaxATZG0AACWCUUhQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCXAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"first_complete_player":{"pid":1744078322,"nnid":"hellodarknessyee","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAAAAQPB5NCID5WIC2Pb9Wi0zXBWf6gAAARRSAG8AcwBlAAAATQBFAAAAAAAAAEBAMgCBBaBoYxKzMoUOoQwTZgwAKCG4OUhQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGK9AAAAAAAAAAAAAAAAAAAAAAAAAAE="},"best_time_ms":5129,"created_time":"2019-08-19T20:02:41Z","updated_time":"2019-08-20T18:41:32Z"}}

So far I have tried using a loop and appending the data to a new array, and using eval to do basically the same thing.

2

Answers


  1. What I would do using jq:

    jq -r '"ID: (.id)
    Course name: (.course_name)
    Creator name: (.creator.nnid)"' file.json
    

    Yields:

    ID: 66782542
    Course name: go
    Creator name: Khanna1974
    
    Login or Signup to reply.
  2. Let’s take the JSON and Python out of this as your question is about bash arrays and so those are just muddying things as you wouldn’t have bash arrays involved in processing JSON with python.

    So your current script basically looks like:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    ids=$(printf '%sn' 123 97 512)
    cns=$(printf '%sn' "This Is First" "Second Here" "And Third")
    crs=$(printf '%sn' "Joe Shmo" "Sue Me" "Perry The Platypus")
    
    for id in $ids; do
            ida+=($id)
    done
    
    for i in "${!ida[@]}"; do
            printf 'i: <%s>n' "$i"
            printf 'ID: <%s>n' "${ida[i]}"
            printf 'Course Name: <%s>n' "${cns[i]}"
            printf 'Creator Name: <%s>n' "${crs[i]}"
            printf "n"
    done
    

    and when executed it’ll output:

    $ ./tst.sh
    i: <0>
    ID: <123>
    Course Name: <This Is First
    Second Here
    And Third>
    Creator Name: <Joe Shmo
    Sue Me
    Perry The Platypus>
    
    i: <1>
    ID: <97>
    Course Name: <>
    Creator Name: <>
    
    i: <2>
    ID: <512>
    Course Name: <>
    Creator Name: <>
    

    when what you want output is:

    i: <0>
    ID: <123>
    Course Name: <This Is First>
    Creator Name: <Joe Shmo>
    
    i: <1>
    ID: <97>
    Course Name: <Second Here>
    Creator Name: <Sue Me>
    
    i: <2>
    ID: <512>
    Course Name: <And Third>
    Creator Name: <Perry The Platypus>
    

    That’s because you’re trying to access the contents of cns and crs by indexing them with the index from ida[], e.g. "${cns[i]}", but neither cns nor crs are arrays, they’re strings, and so cns[0] is interpreted as just cns and so prints the whole cns string while cns[any other index] prints null. If you want them to be arrays then you could do:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    readarray -t ids < <(printf '%sn' 123 97 512)
    readarray -t cns < <(printf '%sn' "This Is First" "Second Here" "And Third")
    readarray -t crs < <(printf '%sn' "Joe Shmo" "Sue Me" "Perry The Platypus")
    
    for i in "${!ids[@]}"; do
            printf 'i: <%s>n' "$i"
            printf 'ID: <%s>n' "${ids[i]}"
            printf 'Course Name: <%s>n' "${cns[i]}"
            printf 'Creator Name: <%s>n' "${crs[i]}"
            printf "n"
    done
    

    $ ./tst.sh
    i: <0>
    ID: <123>
    Course Name: <This Is First>
    Creator Name: <Joe Shmo>
    
    i: <1>
    ID: <97>
    Course Name: <Second Here>
    Creator Name: <Sue Me>
    
    i: <2>
    ID: <512>
    Course Name: <And Third>
    Creator Name: <Perry The Platypus>
    

    Now let’s apply that to your real starting point which is you have some input text that you want to select some parts of for further processing, e.g. let’s say you have CSV like:

    $ cat input.csv
    123,foo,This Is First,bar,Joe Shmo,other stuff
    50,foo,To be Ignored,bar,Griffin,more stuff
    97,foo,Second Here,bar,Sue Me,bad stuff
    17,foo,Also Ignored,bar,The Stranger,etc.
    512,foo,And Third,bar,Perry The Platypus,Blah blah
    

    and want to process 3 fields from every line that doesn’t contain the string Ignore. Again ignoring the fact you wouldn’t really use a bash script for this and we’re just demonstrating how to use bash arrays, that’d look like:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    infile='input.csv'
    
    readarray -t ids < <(awk -F',' '!/Ignore/{print $1}' "$infile")
    readarray -t cns < <(awk -F',' '!/Ignore/{print $3}' "$infile")
    readarray -t crs < <(awk -F',' '!/Ignore/{print $5}' "$infile")
    
    for i in "${!ids[@]}"; do
            printf 'i: <%s>n' "$i"
            printf 'ID: <%s>n' "${ids[i]}"
            printf 'Course Name: <%s>n' "${cns[i]}"
            printf 'Creator Name: <%s>n' "${crs[i]}"
            printf "n"
    done
    

    $ ./tst.sh
    i: <0>
    ID: <123>
    Course Name: <This Is First>
    Creator Name: <Joe Shmo>
    
    i: <1>
    ID: <97>
    Course Name: <Second Here>
    Creator Name: <Sue Me>
    
    i: <2>
    ID: <512>
    Course Name: <And Third>
    Creator Name: <Perry The Platypus>
    

    but calling awk 3 times to do the same search of your input file 3 times printing 1 field at a time is wasteful and makes it hard to maintain your script if your search criteria changes so you’d really want to do the search once to get all the data and then use it as you like, e.g.:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    infile='input.csv'
    
    readarray -t lines < <(
        awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
    )
    
    readarray -t ids < <(printf '%sn' "${lines[@]}" | cut -d, -f1)
    readarray -t cns < <(printf '%sn' "${lines[@]}" | cut -d, -f2)
    readarray -t crs < <(printf '%sn' "${lines[@]}" | cut -d, -f3)
    
    for i in "${!ids[@]}"; do
            printf 'i: <%s>n' "$i"
            printf 'ID: <%s>n' "${ids[i]}"
            printf 'Course Name: <%s>n' "${cns[i]}"
            printf 'Creator Name: <%s>n' "${crs[i]}"
            printf "n"
    done
    

    or less efficiently:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    infile='input.csv'
    
    readarray -t lines < <(
        awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
    )
    
    while IFS=',' read -r id cn cr; do
        ids+=( "$id" )
        cns+=( "$cn" )
        crs+=( "$cr" )
    done < <(printf '%sn' "${lines[@]}")
    
    for i in "${!ids[@]}"; do
            printf 'i: <%s>n' "$i"
            printf 'ID: <%s>n' "${ids[i]}"
            printf 'Course Name: <%s>n' "${cns[i]}"
            printf 'Creator Name: <%s>n' "${crs[i]}"
            printf "n"
    done
    

    but of course if you just needed to do something in bash with the id, cn, and cr values such as call some other tool with them as arguments/input (represented by a function prtVals() below), you could actually just do either of these instead:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    infile='input.csv'
    
    prtVals() {
            printf 'ID: <%s>n' "$1"
            printf 'Course Name: <%s>n' "$2"
            printf 'Creator Name: <%s>n' "$3"
            printf "n"
    }
    
    while IFS=',' read -r id cn cr; do
            prtVals "$id" "$cn" "$cr"
    done < <(
        awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
    )
    

    or:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    infile='input.csv'
    
    prtVals() {
            printf 'ID: <%s>n' "$1"
            printf 'Course Name: <%s>n' "$2"
            printf 'Creator Name: <%s>n' "$3"
            printf "n"
    }
    
    while IFS=',' read -r -a line; do
            prtVals "${line[@]}"
    done < <(
        awk 'BEGIN{FS=OFS=","} !/Ignore/{print $1, $3, $5}' "$infile"
    )
    

    or:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    infile='input.csv'
    
    prtVals() {
            printf 'ID: <%s>n' "$1"
            printf 'Course Name: <%s>n' "$2"
            printf 'Creator Name: <%s>n' "$3"
            printf "n"
    }
    
    export -f prtVals
    
    awk 'BEGIN{FS=OFS=ORS=","} !/Ignore/{print $1, $3, $5}' "$infile" |
    xargs -d ',' -n 3 bash -c 'prtVals "$@"' _
    

    That last one requires GNU xargs for -d.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search