skip to Main Content

I’m trying to write a bash script that will download the contents of a URL (not recursive) and then analyze the file that was downloaded.

If the downloaded file is a text file (i.e index.html) I want to know the size of the file and count the number of characters within that file.

If the file is an image file I just want to know the file size.

Right now I’m working with wget and downloading the contents of the input URL, but the problem is that when I do this inside my script I don’t know the file name of the file that was downloaded.

So, the two main question are:

  1. How can I get the filename in my script after using wget to perform some analyzing operations on the file?
  2. How can I deterime the file type of the downloaded file?

2

Answers


  1. Chosen as BEST ANSWER

    I did finally manage to solve it.

    #!usr/bin/env bash
    URL="$1"
    FILENAME=$(date +%y-%m-%d-%T) #Set the current date and time as the filename
    wget -O "$FILENAME" "$URL"    #Download the content from the URL and set the filename
    FILE_INFO=$(file "$FILENAME") #Store the output from the 'file' command
    
    if [[ "$FILE_INFO" == *"text"* ]]
    then 
     echo "It's a text file"
    elif [[ "$FILE_INFO" == *"image"* ]]
    then 
     echo "It's an image"
    fi
    

    Special thanks to Ben Scott for the help!


  2. I would suggest setting the file name wget will write to, using the -O switch. One can then generate a file name, tell wget to download the URL to that file name, and run whatever analysis tools one wants, using the file name you picked.

    The idea here is, you not have to figure out what name the web site or URL or wget will pick — you are controlling the parameters. That is a useful programming technique in general. The less the user or some external program or website can provide for input, the more robust and simpler your program code will be.

    As for picking a file name, you could use a timestamp. The date utility can generate a timestamp for you, if you give it a +FORMAT parameter. Alternatively, since you mention this is part of an analysis tool, maybe you don’t want to save the file at all. In that case, try a tool like mktemp to generate a guaranteed unique file name, and then remove it before exiting.

    For more information, see the manual pages wget(1), date(1), and mktemp(1).

    Not giving complete working code, in case anyone ever gets this as school assignment, and they stumble across this question. I wouldn’t want to make it too easy for that hypothetical person. 😉 Of course, if someone asked more specific questions, I’d likely clarify my answer for them.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search