skip to Main Content

I have a set of files and a SHA256SUMS digest file that contains a sha256() hash for each of the files. What’s the best way to verify the integrity of my files with python?

For example, here’s how I would download the Debian 10 net installer SHA256SUMS digest file and download/verify its the MANIFEST file in BASH

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 02:11:20--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K  71.7KB/s    in 1.0s    

2020-08-25 02:11:22 (71.7 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 02:11:27--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 02:11:28 (128 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ sha256sum --check --ignore-missing SHA256SUMS 
./MANIFEST: OK
user@host:~$ 

What is the best way to do this same operation (download and verify the integrity of the Debian 10 MANIFEST file using the SHA256SUMS file) in python?

3

Answers


  1. Chosen as BEST ANSWER

    The following python script implements a function named integrity_is_ok() that takes the path to a SHA256SUMS file and a list of files to be verified, and it returns False if any of the files couldn't be verified and True otherwise.

    #!/usr/bin/env python3
    from hashlib import sha256
    import os
    
    # Takes the path (as a string) to a SHA256SUMS file and a list of paths to
    # local files. Returns true only if all files' checksums are present in the
    # SHA256SUMS file and their checksums match
    def integrity_is_ok( sha256sums_filepath, local_filepaths ):
    
        # first we parse the SHA256SUMS file and convert it into a dictionary
        sha256sums = dict()
        with open( sha256sums_filepath ) as fd:
            for line in fd:
                # sha256 hashes are exactly 64 characters long
                checksum = line[0:64]
    
                # there is one space followed by one metadata character between the
                # checksum and the filename in the `sha256sum` command output
                filename = os.path.split( line[66:] )[1].strip()
                sha256sums[filename] = checksum
    
        # now loop through each file that we were asked to check and confirm its
        # checksum matches what was listed in the SHA256SUMS file
        for local_file in local_filepaths:
    
            local_filename = os.path.split( local_file )[1]
    
            sha256sum = sha256()
            with open( local_file, 'rb' ) as fd:
                data_chunk = fd.read(1024)
                while data_chunk:
                    sha256sum.update(data_chunk)
                    data_chunk = fd.read(1024)
    
            checksum = sha256sum.hexdigest()
            if checksum != sha256sums[local_filename]:
                return False
    
        return True
    
    if __name__ == '__main__':
    
        script_dir = os.path.split( os.path.realpath(__file__) )[0]
        sha256sums_filepath = script_dir + '/SHA256SUMS'
        local_filepaths = [ script_dir + '/MANIFEST' ]
    
        if integrity_is_ok( sha256sums_filepath, local_filepaths ):
            print( "INFO: Checksum OK" )
        else:
            print( "ERROR: Checksum Invalid" )
    

    Here is an example execution:

    user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
    --2020-08-25 22:40:16--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
    Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
    Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 75295 (74K)
    Saving to: ‘SHA256SUMS’
    
    SHA256SUMS          100%[===================>]  73.53K   201KB/s    in 0.4s    
    
    2020-08-25 22:40:17 (201 KB/s) - ‘SHA256SUMS’ saved [75295/75295]
    
    user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
    --2020-08-25 22:40:32--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
    Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
    Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 1709 (1.7K)
    Saving to: ‘MANIFEST’
    
    MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      
    
    2020-08-25 22:40:32 (13.0 MB/s) - ‘MANIFEST’ saved [1709/1709]
    
    user@host:~$ ./sha256sums_python.py 
    INFO: Checksum OK
    user@host:~$ 
    

    Parts of the above code were adapted from the following answer on Ask Ubuntu:


  2. You may calculate the sha256sums of each file as described in this blog post:

    https://www.quickprogrammingtips.com/python/how-to-calculate-sha256-hash-of-a-file-in-python.html

    A sample implementation to generate a new manifest file may look like:

    import hashlib
    from pathlib import Path
    
    # Your output file
    output_file = "manifest-check"
    
    # Your target directory
    p = Path('.')
    
    sha256_hash = hashlib.sha256()
    
    with open(output_file, "w") as out:
      # Iterate over the files in the directory
      for f in p.glob("**/*"):
        # Process files only (no subdirs)
        if f.is_file():
          with open(filename,"rb") as f:
          # Read the file by chunks
          for byte_block in iter(lambda: f.read(4096),b""):
            sha256_hash.update(byte_block)
          out.write(f + "t" + sha256_hash.hexdigest() + "n")
    

    Alternatively, this seems to be achieved by manifest-checker pip package.

    You may have a look at its source here
    https://github.com/TonyFlury/manifest-checkerand adjust it for python 3

    Login or Signup to reply.
  3. Python 3.11 added hashlib.file_digest()

    https://docs.python.org/3.11/library/hashlib.html#file-hashing

    Generating the digest for a file:

    with open("my_file", "rb") as f:
        digest = hashlib.file_digest(f, "sha256")
        s = digest.hexdigest()
    

    Compare s against the information you have in SHA256SUMS.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search