skip to Main Content
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.html
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_1_fastqc.zip
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_2_fastqc.html
S_1004_DKDL220006264-1A_HLGLFDSX3_L4_cleaned_2_fastqc.zip
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_1_fastqc.html
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_1_fastqc.zip
S_1006_DKDL220006298-1A_HKFTLDSX3_L1_cleaned_2_fastqc.html

above are the name of the files in a folder.
I want to remove between second _ from right and second _ from left. So that the output looks like

S_1004__1_fastqc.html
S_1004__1_fastqc.zip
S_1004__2_fastqc.html
S_1004__2_fastqc.zip
S_1006__1_fastqc.html
S_1006__1_fastqc.zip
S_1006__2_fastqc.html

How do I do this using bash?

I tried the following code:

for file in *.html *.zip; do
  new_name=$(echo "$file" | sed 's/_[^_]*_/_/')
  mv "$file" "$new_name"
done

but it did not work the way I want.

3

Answers


  1. In case if you’re looking for an awk script for achieving the above solution. I would set the field separtor as _ and then print the desired fields (first, second, and last but second and last fields) first and second and last separated by _ and last but second separated by __.NF being the predefined variable.

    Here is the sample:

    files=("All_files")
    
    for file in "${files[@]}"; do
      new_name=$(echo "$file" | awk -F '_' '{print $1 "_" $2 "__" $(NF-1) "_" $NF}')
      echo "$file -> $new_name"
      # mv "$file" "$new_name" # uncomment to rename them actually
    done
    

    Script demo


    Alternative Solution(using perl):

    files=("All_files")
    for file in "${files[@]}"; do
      new_name=$(echo "$file" | perl -pe 's/(S_d+)_.*_.*_(.*_.*)/1__2/')
      # currently the regex used with perl can accept any character.
      # If you require only word characters(i.e. [A-Za-z0-9_]), replace .* with w*
      echo "Renaming: $file -> $new_name"
      # mv "$file" "$new_name"
    done
    

    Script demo

    Login or Signup to reply.
  2. With a recent enough bash (at least 3.0 for [[ string =~ regexp ]] and BASH_REMATCH):

    for f in *.html *.zip; do
      [[ "$f" =~ ^(([^_]*_){2}).+((_[^_]*){2})$ ]] && mv "$f" "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
    done
    

    With an older bash:

    for f in *.html *.zip; do
      set -f; IFS=_ a=( $f ); set +f; n="${#a[@]}"
      (( n > 4 )) && mv "$f" "${a[0]}_${a[1]}__${a[n-2]}_${a[n-1]}"
    done
    

    Note: set -f; ...; set +f to temporarily suppress pathname expansion because your file names could contain glob operators (*, ?, [...]).

    Login or Signup to reply.
  3. Try using awk instead of sed

    for file in *.html *.zip; do
      new_name=$(echo "$file" | awk -F"_" '{print $1"_"$2"_"$NF}')
      mv "$file" "$new_name"
    done
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search