S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.html
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.zip
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_2_fastqc.html
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_2_fastqc.zip
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_1_fastqc.html
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_1_fastqc.zip
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_2_fastqc.html
above are the name of the files in a folder.
I want to remove between second _
from right and second _
from left. So that the output looks like
S_1004__1_fastqc.html
S_1004__1_fastqc.zip
S_1004__2_fastqc.html
S_1004__2_fastqc.zip
S_1006__1_fastqc.html
S_1006__1_fastqc.zip
S_1006__2_fastqc.html
How do I do this using bash?
I tried the following code:
for file in *.html *.zip; do
new_name=$(echo "$file" | sed 's/_[^_]*_/_/')
mv "$file" "$new_name"
done
but it did not work the way I want.
3
Answers
In case if you’re looking for an
awk
script for achieving the above solution. I would set the field separtor as_
and then print the desired fields (first, second, and last but second and last fields) first and second and last separated by_
and last but second separated by__
.NF
being the predefined variable.Here is the sample:
Script demo
Alternative Solution(using
perl
):Script demo
With a recent enough
bash
(at least 3.0 for[[ string =~ regexp ]]
andBASH_REMATCH
):With an older bash:
Note:
set -f; ...; set +f
to temporarily suppress pathname expansion because your file names could contain glob operators (*
,?
,[...]
).Try using awk instead of sed