skip to Main Content

We are in the process of rewriting a lot of Perl code in our infrastructure to Python. The syntax is similar and it reads similarly. However, there are a number of subtle differences that makes the conversion a bit challenging.

We have a command-line utility in our Linux environment called "read_file". You pass a pathname to a file and read_file will determine how to open it and "read" from it. You can pass in a *.txt, *.dat, *.gz, *.bz2, *.pgp and even a *.zip (as long as there is only 1 file in the zip archive) and read_file will determine how to unzip/decrypt and pass a stream of human-readable output.

We leverage this a lot in our Perl programs and opening a file is as simple as:

my $fh = IO::File->new();
$fh->open("read_file $file_name |") or abend_pgm("open: $!");
while ( my $line = <$fh> ) {
  ...
}

This causes Perl to drop to the shell, start up a "read_file " and its output is piped in as the input to the $fh->open(), so that the Perl script sees only the unzipped/decrypted text without any special handling.

Is there a similar way in Python to accomplish this?

Also, is there a similar thing that can be done with the writes, so that the output from Python open() is piped through let’s say, "gzip" directly? In Perl it’s done like this:

open (my $fh, "| /usr/bin/gzip -c > $file_name ");

This opens a file for output, anything emitted to the $fh file handle will be piped to /usr/bin/gzip -c and the zipped output is redirected to the $file_name. All of the conversions is handled by Perl and the OS rather than the script, which makes it very simple to leverage the huge library of functions available in Linux itself.

Is this something I will have to do a popen() and read from that output?


I have tried this and variations upon this, and could not figure out what else can be done:

>>> with open(" /ds/CENTOS/common/bin/read_file /ds/tmp/177598.TEST_20240225.dat.gz","r") as file1:
...   read_content = file1.read()
...   print(read_content)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: ' /ds/CENTOS/common/bin/read_file /ds/tmp/177598.NATL_340B_20240225.dat.gz'
>>>

In Perl, I can access this file just fine and it’ll unzip and print the file to the screen:

perl -e 'my $fh = IO::File->new(); $fh->open("/ds/CENTOS/common/bin/read_file /ds/tmp/177598.TEST_20240225.dat.gz |"); while (my $line = <$fh>) { print $line; };'

3

Answers


  1. There are more recent methods for doing this in Python, which offer several ways of dealing with buffering/errors in the remote process – at the cost of no little boiler plate and complications.

    But Python still keeps the bureaucracy-free way to do that, which is os.popen:

    In [1]: import os
    
    In [2]: print(os.popen("gunzip -c debuginfo.json.gz").read())
    {
     "App": "com.whatsapp.w4b",
     "Architecture": "aarch64",
     "Board": "sm6150",
     "Build": "RP1A.200720.012.A705MNXXS5DWH1",
     "CCode": " ",
     "CPU ABI": "arm64-v8a",
    (...)
    

    if you ever need the more controlled variants of Popen, introduced in the early 2000’s, check the docs for the subprocess module.

    Login or Signup to reply.
  2. Perl’s open function allows for a lot of things that are more akin to what the subprocess module in Python is designed for. Python open is for opening files, not executing arbitrary shell pipelines.

    For example,

    my $fh = IO::File->new();
    $fh->open("read_file $file_name |") or abend_pgm("open: $!");
    while ( my $line = <$fh> ) {
      ...
    }
    

    becomes

    from subprocess import Popen, PIPE
    
    p = Popen(["read_file", file_name], stdout=PIPE)
    (stdout, _) = p.communicate()
    for line in stdout:
        ...
    

    The call to Popen could fail to start read_file at all, or read_file could exit with an error. The former usually produces an exception you would need to catch; the latter would be detected by checking p.returncode directly.

    Login or Signup to reply.
  3. Python standard library contains tools for working with various types of compressed files, as described in Data Compression and Archiving. They often could be used like open built-in. Take for example reading .gz files firstly let create such file

    echo "ABLE BAKER CHARLIE" > file.txt
    gzip file.txt
    

    then we can do in python

    import gzip
    with gzip.open('file.txt.gz', 'rb') as f:
        file_content = f.read()
    print(file_content)
    

    gives output

    b'ABLE BAKER CHARLIEn'
    

    Keep in mind determining which device from standard library to use w.r.t to file is up to you. Note you get bytes as file_content if you wish to get text you should .decode it.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search