We are in the process of rewriting a lot of Perl code in our infrastructure to Python. The syntax is similar and it reads similarly. However, there are a number of subtle differences that makes the conversion a bit challenging.
We have a command-line utility in our Linux environment called "read_file". You pass a pathname to a file and read_file will determine how to open it and "read" from it. You can pass in a *.txt, *.dat, *.gz, *.bz2, *.pgp and even a *.zip (as long as there is only 1 file in the zip archive) and read_file will determine how to unzip/decrypt and pass a stream of human-readable output.
We leverage this a lot in our Perl programs and opening a file is as simple as:
my $fh = IO::File->new();
$fh->open("read_file $file_name |") or abend_pgm("open: $!");
while ( my $line = <$fh> ) {
...
}
This causes Perl to drop to the shell, start up a "read_file " and its output is piped in as the input to the $fh->open(), so that the Perl script sees only the unzipped/decrypted text without any special handling.
Is there a similar way in Python to accomplish this?
Also, is there a similar thing that can be done with the writes, so that the output from Python open() is piped through let’s say, "gzip" directly? In Perl it’s done like this:
open (my $fh, "| /usr/bin/gzip -c > $file_name ");
This opens a file for output, anything emitted to the $fh file handle will be piped to /usr/bin/gzip -c and the zipped output is redirected to the $file_name. All of the conversions is handled by Perl and the OS rather than the script, which makes it very simple to leverage the huge library of functions available in Linux itself.
Is this something I will have to do a popen() and read from that output?
I have tried this and variations upon this, and could not figure out what else can be done:
>>> with open(" /ds/CENTOS/common/bin/read_file /ds/tmp/177598.TEST_20240225.dat.gz","r") as file1:
... read_content = file1.read()
... print(read_content)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: ' /ds/CENTOS/common/bin/read_file /ds/tmp/177598.NATL_340B_20240225.dat.gz'
>>>
In Perl, I can access this file just fine and it’ll unzip and print the file to the screen:
perl -e 'my $fh = IO::File->new(); $fh->open("/ds/CENTOS/common/bin/read_file /ds/tmp/177598.TEST_20240225.dat.gz |"); while (my $line = <$fh>) { print $line; };'
3
Answers
There are more recent methods for doing this in Python, which offer several ways of dealing with buffering/errors in the remote process – at the cost of no little boiler plate and complications.
But Python still keeps the bureaucracy-free way to do that, which is
os.popen
:if you ever need the more controlled variants of Popen, introduced in the early 2000’s, check the docs for the subprocess module.
Perl’s
open
function allows for a lot of things that are more akin to what thesubprocess
module in Python is designed for. Pythonopen
is for opening files, not executing arbitrary shell pipelines.For example,
becomes
The call to
Popen
could fail to startread_file
at all, orread_file
could exit with an error. The former usually produces an exception you would need to catch; the latter would be detected by checkingp.returncode
directly.Python standard library contains tools for working with various types of compressed files, as described in Data Compression and Archiving. They often could be used like
open
built-in. Take for example reading .gz files firstly let create such filethen we can do in python
gives output
Keep in mind determining which device from standard library to use w.r.t to file is up to you. Note you get bytes as
file_content
if you wish to get text you should.decode
it.