I am currently writing a Bash script to search through a JSONL file of Mario Maker level data (found in this Reddit post https://www.reddit.com/r/MarioMaker/comments/wlkwp9/easily_searchable_database_of_super_mario_maker_1/) and print the level id, name of the course, and name of the creator. I am however running into a problem when trying to print all this found information out, as the arrays I’m getting are separated using spaces in the course names instead of new lines or quotes. Below is my current code, the Python script used to convert the id’s to in-game id’s, and a single entry in the JSONL file (it’s over 16gb of raw text)
Bash script:
#!/bin/bash
ids=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $1}' | awk -F ':' '{print $2}')
cns=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $5}' | awk -F ':' '{print $2}')
crs=$(grep $1 /path/to/file/courses.jsonl | awk -F ',' '{print $7}' | awk -F ':' '{print $2}')
for id in $ids; do
idn=$(python3 /path/to/file/mmid.py $id)
ida+=($idn)
done
for i in "${!ida[@]}"; do
printf "ID: %sn" "${ida[i]}"
printf "Course Name: %sn" "${cns[i]}"
printf "Creator Name: %sn" "${crs[i]}"
printf "n"
done
Python Script (code mostly taken from the above Reddit thread)
import struct
import hashlib
import hmac
import sys
idno = int(sys.argv[1])
key = hashlib.md5(b"9f2b4678").digest()
data = struct.pack("<Q", idno)
checksum = hmac.HMAC(key, data, 'md5').digest()
checksum = checksum[3:1:-1].hex().upper()
idstring = str.upper(hex(idno))[2:]
for y in range(8 - len(idstring)):
idstring = '0' + str(idstring)
code = str(checksum) + '0000' + str(idstring)
print(code)
Example Entry from courses.jsonl
{"id":66782542,"retrieval_date":"2021-06-02T02:40:10.837452Z","url":"https://d2sno3mhmk1ekx.cloudfront.net/10.WUP_AMAJ_datastore/ds/1/data/00066782542-00001?Expires=1625193610&Signature=vh50sqvoN2u-Xu~uT2pYJMNzj1kV11NDt77BDC2UM5o9VtTz-3HNxheiWc~PxDJhCqDNL-M7u9qrFnQ6FkOxDTyzo3QrK1VcNDqoAFRKA2RL03au-FdN9daY4~CDeKS3TvEkpzqBGe9fZfwlz6S-z7~VwRLsjPbw26QSkeszZOdGNT75RWyx2jeqKYpcjzi4tagbiWwq0DzbHzaXjlIpYpTsHIPaemS0fpad0d-Hgv56R-c3BCt2rzCoxco~jpVO2FW2HZoQRJIPU0mwbZE0wLlpKEHMwEcOoUbRIbYP~5U4XFwc6eahHc19GRjttzNDHBm68u9yhi3BVjlaMnRm2g__&Key-Pair-Id=APKAJUYKVK3BE6ZPNZBQ","stars":1,"course_name":"go","creator":{"pid":1753458969,"nnid":"Khanna1974","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAwAAQLAAeGTApABA1vjUsH+qXdiADAAAcxVNAGEAcgBpAAAATQBFAAAAAAAAAEBAAACDAUJoRBggNEYUgRIBaA0AACkAUiVBTQBhAHIAaQAAAAAAAAAAAAAAAAAAAI1HAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"upload_time":"2019-08-19T19:53:26Z","user_plays":17,"clears":14,"total_attempts":27,"failures":13,"world_record":{"best_time_player":{"pid":1744579894,"nnid":"HifysWU","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAAAAQMlEqo2ExqEA13Ajt2LjAh1dDQAAACBSAGEAdgBnAHoAAABFAAAAAAAAAH8oJgAiAIhGgxgTEoYUaxATZG0AACWCUUhQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCXAAAAAAAAAAAAAAAAAAAAAAAAAAE="},"first_complete_player":{"pid":1744078322,"nnid":"hellodarknessyee","mii_data":"QlBGQwAAAAEAAAAAAAAAAAAAAAAAAQAAAAAAQPB5NCID5WIC2Pb9Wi0zXBWf6gAAARRSAG8AcwBlAAAATQBFAAAAAAAAAEBAMgCBBaBoYxKzMoUOoQwTZgwAKCG4OUhQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGK9AAAAAAAAAAAAAAAAAAAAAAAAAAE="},"best_time_ms":5129,"created_time":"2019-08-19T20:02:41Z","updated_time":"2019-08-20T18:41:32Z"}}
So far I have tried using a loop and appending the data to a new array, and using eval
to do basically the same thing.
2
Answers
What I would do using
jq
:Yields:
Let’s take the JSON and Python out of this as your question is about bash arrays and so those are just muddying things as you wouldn’t have bash arrays involved in processing JSON with python.
So your current script basically looks like:
and when executed it’ll output:
when what you want output is:
That’s because you’re trying to access the contents of
cns
andcrs
by indexing them with the index fromida[]
, e.g."${cns[i]}"
, but neithercns
norcrs
are arrays, they’re strings, and socns[0]
is interpreted as justcns
and so prints the wholecns
string whilecns[any other index]
prints null. If you want them to be arrays then you could do:Now let’s apply that to your real starting point which is you have some input text that you want to select some parts of for further processing, e.g. let’s say you have CSV like:
and want to process 3 fields from every line that doesn’t contain the string
Ignore
. Again ignoring the fact you wouldn’t really use a bash script for this and we’re just demonstrating how to use bash arrays, that’d look like:but calling awk 3 times to do the same search of your input file 3 times printing 1 field at a time is wasteful and makes it hard to maintain your script if your search criteria changes so you’d really want to do the search once to get all the data and then use it as you like, e.g.:
or less efficiently:
but of course if you just needed to do something in bash with the
id
,cn
, andcr
values such as call some other tool with them as arguments/input (represented by a functionprtVals()
below), you could actually just do either of these instead:or:
or:
That last one requires GNU xargs for
-d
.