skip to Main Content

This question pertains to the situation where

  1. An image was uploaded, say mypicture.jpg
  2. WordPress created multiple copies of it with different resolutions like mypicture-300x500.jpg and mypicture-600x1000.jpg
  3. You delete the original image only

In this scenario, the remaining photos on the filesystem are mypicture-300x500.jpg and mypicture-600x1000.jpg.

How can you script this to find these "dangling" images with the missing original and delete the "dangling" images.

2

Answers


  1. Chosen as BEST ANSWER

    I've written a Bash script that will attempt to find the original filename (i.e. mypicture.jpg) based on scraping away the WordPress resolution (i.e. mypicture-300x500.jpg), and if it's not found, delete the "dangling image" (i.e. rm -f mypicture-300x500.jpg)

    #!/bin/bash
    
    for directory in $(find . -type d)
    do
            for image in $(ls $directory)
            do
                    echo "The current filename is $image"
                    resolution=$(echo $image | rev | cut -f 1 -d "-" | rev | xargs)
                    echo "The resolution is $resolution"
                    extension=$(echo $resolution | rev| cut -f 1 -d "." | rev | xargs)
                    echo "The extension is $extension"
                    resolutiononly=$(echo $resolution | sed "s@.$extension@@g")
                    echo "The resolution only is $resolutiononly"
                    pattern="[0-9]+x[0-9]+"
                    if [[ $resolutiononly =~ $pattern ]]; then
                            echo "The pattern matches"
                            originalfilename=$(echo $image | sed "s@-$resolution@.$extension@g")
                            echo "The current filename is $image"
                            echo "The original filename is $originalfilename"
                            if [[ -f "$originalfilename" ]]; then
                                    echo "The file exists $originalfilename"
                            else
                                    rm -f $directory/$image
                            fi
                    else
                            break
                    fi
            done
    done
    

  2. You could use find to find all lower resolution pictures with the -regex test:

    find . -type f -regex '.*-[0-9]+x[0-9]+.jpg'
    

    And this would be much better than trying to parse the ls output which is for humans only, not for automation. A safer (and simpler) bash script could thus be:

    #!/usr/bin/env bash
    
    while IFS= read -r -d '' f; do
      [[ "$f" =~ (.*)-[0-9]+x[0-9]+.jpg ]] &&
      ! [ -f "${BASH_REMATCH[1]}".jpg ] &&
      echo rm -f "$f"
    done < <(find . -type f -regex '.*-[0-9]+x[0-9]+.jpg' -print0)
    

    (delete the echo once you will be convinced that it works as expected).

    Note: we use the -print0 action and the empty read delimiter (-d '') to separate the file names with the NUL character instead of the newline character. This is preferable because it works as expected even if you have unusual file names (e.g., with spaces).

    Note: as we test the file name inside the loop we could simply search for files (find . -type f -print0). But I suspect that if you have a large number of files the performance would be negatively impacted. So keeping the -regex test is probably better.

    Bash loops are OK but they tend to become really slow when the number of iteration increases. So, let’s incorporate our simple bash script in a single find command with the -exec action:

    find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+.jpg ]] &&
      ! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} ; -print
    

    Note: bash -c takes a script to execute as first argument, then the positional parameters to pass to the script, starting with $0. This is why we pass _ (my favourite for don’t care), followed by {} (the current file path).

    Note: -print is normally the default find action but here it is needed because -exec is one of the find actions that inhibit the default behaviour.

    This will print a list of files. Check that it is correct and, once you will be satisfied, add the -delete action:

    find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+.jpg ]] &&
      ! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} ; -delete -print
    

    See man find and man bash for more explanations.

    Demo:

    $ touch mypicture.jpg mypicture-300x500.jpg mypicture-600x1000.jpg
    $ find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+.jpg ]] &&
      ! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} ; -print
    $ rm -f mypicture.jpg
    $ find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+.jpg ]] &&
      ! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} ; -print
    ./mypicture-300x500.jpg
    ./mypicture-600x1000.jpg
    $ find . -type f -exec bash -c '[[ "$1" =~ (.*)-[0-9]+x[0-9]+.jpg ]] &&
      ! [ -f "${BASH_REMATCH[1]}".jpg ]' _ {} ; -delete -print
    ./mypicture-300x500.jpg
    ./mypicture-600x1000.jpg
    $ ls *.jpg
    ls: cannot access '*.jpg': No such file or directory
    

    One last note: if, by accident, one of your full resolution picture matches the regular expression for lower resolution pictures (e.g., if you have a balloon-1x1.jpg full resolution picture) it will be deleted. This is unfortunate but according your specifications there is no easy way to distinguish it from an orphan lower resolution picture. Be careful…

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search