skip to Main Content

I am offering a means of downloading multiple photos as a single archive file.

TAR format works fine, but most people don’t have anything that can open TAR files on their device, then complain that it doesn’t work.

ZIP format is unnecessary (since photos are already compressed), but is one that most people will be able to open.

The built-in PHP class ZipArchive appears to only have a method addFile to add one file at a time. It seems that this involves decompression and recompression of the entire archive, with the effect that the more files you add, the slower it gets – i.e. it runs in O(n2) time to add n files. The effect of that becomes catastrophic beyond about 30-40 hi-res photos.

Have I missed something about ZipArchive? Or is this a shortcoming in the class that should be put forward as a feature request?

Are there alternatives to achieve the goal of a quick-to-produce archive format that most people will be able to open without installing additional software?

2

Answers


  1. Regarding the comments in the ZipArchive::addFile manual, it looks like the file is not actually added on the function call. The files will be added when the zip-object is closed.

    Login or Signup to reply.
  2. It would be a pretty flawed design if that was the case, but you never know, so let’s test it:

    define('FILE_COUNT', 1); // <-------- Let's increase this
    
    $t0 = microtime(true);
    
    $zip = new ZipArchive();
    $zip->open(sys_get_temp_dir() . '/zip-test.zip', ZipArchive::CREATE);
    for ($i = 0; $i < FILE_COUNT; $i++ ){
        $zip->addFromString("file$i.txt", "Rutrum praesent homero sollicitudin regione scripta massa vix eosn");
    }
    echo "Files added: {$zip->numFiles}n";
    echo "Status: {$zip->status}n";
    $zip->close();
    
    $time = microtime(true) - $t0;
    echo "Total time: " . number_format($time, 3) . " secondsn";
    echo "Average time: " . number_format($time / FILE_COUNT, 3) . " seconds/filen";
    echo "Max RAM used: " . number_format(memory_get_peak_usage(real_usage: true)) . " bytesn";
    

    In my laptop:

    Files added: 1
    Status: 0
    Total time: 0.002 seconds
    Average time: 0.002 seconds/file
    Max RAM used: 2,097,152 bytes
    
    Files added: 1000
    Status: 0
    Total time: 0.161 seconds
    Average time: 0.000 seconds/file
    Max RAM used: 2,097,152 bytes
    
    Files added: 1000000
    Status: 0
    Total time: 135.223 seconds
    Average time: 0.000 seconds/file
    Max RAM used: 95,252,480 bytes
    

    So… This is of course not a scientific benchmark, just a quick test to figure out the overall situation. And the conclusion is that it appears to be O(n) and it definitively does not compress and uncompress the entire archive every time. In fact, the big delay happens between in $zip->close().

    Then I ran some test by disabling compression entirely:

    for ($i = 0; $i < FILE_COUNT; $i++ ){
        $zip->addFromString("file$i.txt", "Rutrum praesent homero sollicitudin regione scripta massa vix eosn");
        $zip->setCompressionName("file$i.txt", ZipArchive::CM_STORE);
    }
    

    Surprise:

    Files added: 1000000
    Status: 0
    Total time: 13.312 seconds
    Average time: 0.000 seconds/file
    Max RAM used: 95,154,176 bytes
    

    You’ll note however that RAM does increase. That’s only an artifact of using addFromString() (it was a quick test). I’ve tried a with $zip->addFile() and the memory usage gets drastically reduced:

    Files added: 1000000
    Status: 0
    Total time: 125.251 seconds
    Average time: 0.000 seconds/file
    Max RAM used: 2,097,152 bytes
    

    In conclusion: quick testing suggest it’s O(1).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search