skip to Main Content

I have following code, it saves some php code into a file, then load it, run again, sometimes the require method returns int, why does this happen?

<?php

$f = function()  use($a){
    $cachePath = '/tmp/t.php';

    $code = '<?php';
    $code .= "nn";
    $code .= 'return ' . var_export([], true) . ';';

    file_put_contents($cachePath, $code, LOCK_EX);

    if (file_exists($cachePath)) {
        // Sometime the following line returns int,why?
        $result = require($cachePath);
        if (!is_array($result)) {
            var_dump($result, $cachePath, file_get_contents($cachePath));
            exit("ok");
        }
        var_dump($result);
    }
};


for($i=0;$i<1000000;$i++) {
    $f();
}

2

Answers


  1. why does this happen?

    This is standard behaviour of require, compare with the docs of include, this behaviour is same between those two:

    Handling Returns: include returns FALSE on failure and raises a warning. Successful includes, unless overridden by the included file, return 1.

    As you can see, an integer (1) is returned on the happy path when the return is not overridden.

    This makes sense for your example in so far, the file exists (therefore no fatal error), but as the file has just been created new, it might have been truncated yet only, that is, it is empty.

    Therefore the return is not overridden and you can see int(1).

    Another explanation is naturally you have overridden with an integer, which is also possible as multiple processes could write to the same file, but for the way you wrote the example, this is less likely. I only mention it, as this is another valid explanation.

    Include if Exists

    Example how you can levitate the race-condition, as you’re looking for the $result, not (only) if the file exists:

    if (($result = @include($cachePath)) &&
        is_array($result)    
    ) {
       # $result is array, which is required
       # ...
    }
    

    The thinking behind it is, that we only do little error handling, like checking the file exists as otherwise it could not be included (include() would only emit a warning and passes on with $result = false), and then if the $result loading did work with the is_array() test.

    That is we design for error, but we know what we’re looking for, that is $result being an array.

    This is often called a transaction or transactional operation.

    In this new example, we would not even enter the if-body when the $result array is empty, e.g. contains no data.

    On program processing level this is likely what we’re interested in and the file existing or not or being empty or not or even wrong written are all error cases it needs to "eat" and to invalidate $result.

    Define errors out of existence.

    Handling Parse Errors (for Include-If-Exists)

    Since PHP 7.0 we can use include() and in the unfortunate event the returning include file has been half-written, we would see a PHP Parse Error which can be caught:

    # start the transaction
    $result = null;
    assert(
        is_string($cachePath) &&           # pathnames are strings,
        '' !== $cachePath &&               # never empty,
        false === strpos($cachePath, "") # and must not contain null-bytes
    );
    try {
        if (file_exists($cachePath)) {
            $result = include($cachePath);
        }
        # invalidate $result in case include() did technically work.
        if (!$result || !is_array($result) {
            $result = null;
        }
    } catch (Throwable $t) {
        # catch all errors and exceptions,
        # the fall-through is intended to invalidate $result.
        $result = null;
    } finally {
        # $result is not null, but a non-empty array if it worked.
        # $result is null, if it could not be acquired.
    }
    

    See PHP try-catch-finally for how the throwable/exception handling works in detail, the assert() is for documenting the example for the meaning of $cachePath, the input parameter.

    This second example does not use the suppression operation "@", the reason is that if it is in use as in the previous example, and the file to include would contain a real fatal error, that fatal error would be silenced. Nowadays in modern PHP this is not that much of an issue any longer, however using file_exists() + include() – while having a race-condition due to time of check vs. time of use – is safe to the non-existing file (only a warning) and fatal errors would not be hidden.

    As you may already see, the more you go into the details, the harder it gets to still write the code as forward thinking as possible. We must not get lost in error handling for the error handling itself, but focus on the outcome and define those errors out of existence.

    That is, include() still is leading loading the data into the memory, file_exists() is only in use to "suppress" the warning and we are aware that still though, include() may emit the warning and may return an integer, not an array.


    And now as programming is hard: You would then perhaps wrap this in a loop, e.g. three retries. Why not a for loop that counts up and guards the number of retries?

    Login or Signup to reply.
  2. This problem isn’t reproducible if there’s only one executor of the script at all times.

    If you’re talking about running this script in parallel, well, the problem is that writing the file in exclusive mode isn’t going to protect you from reading the file half-way through writing later on.

    A process could be writing the file (and having the lock) but require doesn’t adhere to that lock (file system locks are advisory, not enforced).

    So the correct solution would be:

    <?php
    
    $f = function()  use($a){
        $cachePath = '/tmp/t.php';
    
        /* Open the file for writing only. If the file does not exist, it is created.
           If it exists, it is neither truncated (as opposed to 'w'), nor the call to this function fails (as is the case with 'x').
           The file pointer is positioned on the beginning of the file. 
           This may be useful if it's desired to get an advisory lock (see flock()) before attempting to modify the file, as using 'w' could truncate the file before the lock was obtained (if truncation is desired, ftruncate() can be used after the lock is requested). */
        $fp = fopen($cachePath, "c");
    
        $code = '<?php';
        $code .= "nn";
        $code .= 'return ' . var_export([], true) . ';';
     
        // acquire exclusive lock, waits until lock is acquired
        flock($fp, LOCK_EX);
        // clear the file
        ftruncate($fp, 0);
        // write the contents
        fwrite($fp, $code);
    
        //wrong (see my answer as to why)
        //file_put_contents($cachePath, $code, LOCK_EX);
    
        //not needed
        //if (file_exists($cachePath)) {
            // Lock is held during require
            $result = require($cachePath);
            if (!is_array($result)) {
                var_dump($result, $cachePath, file_get_contents($cachePath));
                exit("ok");
            }
            var_dump($result);
        //}
        // closing the file implicitly releases the lock
        fclose($fp);
    };
    
    
    for($i=0;$i<1000000;$i++) {
        $f();
    }
    

    Note that the lock isn’t released and re-acquired after the write because another process could be waiting to overwrite the file.

    It is not released to make sure that the same piece of code that was written is also required.

    However, this whole thing is questionable to begin with.

    Why do you need to write a file to just later require it back in?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search