I have a bunch of files in directories with a file that includes important data like author and title.
/data/unorganised_texts/a-long-story
Many files in the directories, but most importantly each directory includes Data.yaml
with contents like this:
Category:
Name: Space
Author: Jôëlle Frankschiff
References:
Title: Historical
Title: Future
Title: A “long” story!
I need to match these lines as variables $category, $author, $title and make an appropriate structure and copy the directory like so:
/data/organised_texts/$category/$author/$title
Here is my attempt in bash, but probably going wrong in multiple places and as suggested would be better in python.
#!/bin/bash
for dir in /data/unorganised_texts/*/
while IFS= read -r line || [[ $category ]]; do
[[ $category =~ “Category:” ]] && echo "$category" && mkdir /data/organised_texts/$category
[[ $author ]]; do
[[ $author =~ “Author:” ]] && echo "$Author"
[[ $title ]]; do
[[ $title =~ “Title:” ]] && echo "$title" && mkdir /data/organised_texts/$category/$title && cp $dir/* /data/organised_texts/$category/$title/
done <"$dir/Data.yaml"
Here is my bash version, as I was experimenting with readarray
and command eval
and bash version was important:
ubuntu:~# bash --version
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Thanks!
3
Answers
One
bash
idea:NOTE: assumes none of the values include linefeeds
Results:
[[ $varname ]]
will cause a syntax error.mkdir -p
can create directories recursively at a time.Then would you please try the following:
Since the OP was interested in a python solution…
First lets make some test dirs:
Then a simple python script. Python doesn’t (yet) have an ibuilt yaml parser, so
pip install pyyaml
is needed before this:This code probably doesn’t need any explanation even for someone new to python. But for completeness:
ROOT.iterdir()
yields up all the dirs at one level inROOT
. We filter these with a generator comprehension to strip out bare files.There is nothing remotely wrong with doing this in bash. These days I’d have written this python version instead, because a. I know python much better than my very rusty bash, and b. it solves the problem ‘properly’ (e.g. we parse the YAML with a yaml parser), which sometimes makes things more robust.
Note btw that the type hints are optional and ignored at runtime.