I have the issue I have some files that have as content multiple key value pairs that I’d like to transform into multiple arrays.
Let me illustrate what I mean with some produced examples. First the content of the files:
# cat content/1.yaml
time: "2020-09-14T22:33:40Z"
id: ed1d4321
name: One
description: 'Here is number "one"
this is good'
# cat content/2yaml
time: "2021-09-14T22:33:40Z"
id: eg134841
name: Two
description: 'Here is number "two"
best of all'
newkey: value
In the next step I merge these files together into one blob containing the filenames as well which I want to keep:
# for file in $(ls content/*yaml); do echo filename: $file; cat $file; done
filename: content/1.yaml
time: "2020-09-14T22:33:40Z"
id: ed1d4321
name: One
description: 'Here is number "one"
this is good'
filename: content/2yaml
time: "2021-09-14T22:33:40Z"
id: eg134841
name: Two
description: 'Here is number "two"
best of all'
newkey: value
And now the issue begins, how to bring this together into json arrays?
That’s what I came up with up to now:
# for file in $(ls content/*yaml); do echo filename: $file; cat $file; done | jq -Rn '[inputs|split(": ")] | map({(.[0]): .[1]})'
[
{
"filename": "content/1.yaml"
},
{
"time": ""2020-09-14T22:33:40Z""
},
{
"id": "ed1d4321"
},
{
"name": "One"
},
{
"description": "'Here is number "one""
},
{
" this is good'": null
},
{
"filename": "content/2yaml"
},
{
"time": ""2021-09-14T22:33:40Z""
},
{
"id": "eg134841"
},
{
"name": "Two"
},
{
"description": "'Here is number "two""
},
{
" best of all'": null
},
{
"newkey": "value"
}
]
That’s is already close but some issues I still have to solve which I don’t find a solution for:
- The filenames are not spread into separate arrays.
- the
time
field should not have the escaped quoted strings. I’d like to have a solution that iterates over all fields and would expand these contents out of of the quotes like here as example"time": "2021-09-14T22:33:40Z"
description
value is spread over multiple lines and I’d like to see them merged into one value but that’s not what happens as of now, so should look like that:"description": "Here is number "two" best of all
. The single quotes should not be kept.
So at the end the outcome should be rather like that:
[
{
"filename": "content/1.yaml",
"time": "2020-09-14T22:33:40Z",
"id": "ed1d4321",
"name": "One",
"description": "Here is number "one" this is good"
},
{
"filename": "content/2yaml",
"time": "2021-09-14T22:33:40Z",
"id": "eg134841",
"name": "Two",
"description": "Here is number "two" best of all",
"newkey": "value"
}
]
4
Answers
Ok, I found another answer to that which is not using yq but using python which is most probably installed on lots of machines:
This is something probably better suited for
yq
instead of trying to re-implement a YAML parser.Something like this would work:
resulting in
This is a partial solution — the values are not yet "cleaned". This is left as an excercise to the reader 🙂
Start with
jq --slurp --raw-input
:The output is this:
The following handles one file at a time, and presupposes an invocation of jq using the -R and -s command-line options (
jq -Rs
). Combining the results for more than one file is left as an exercise.