skip to Main Content

I’m writing a cron job that’ll execute a script that’ll load up to ~ 100 urls, each url has data that will be memcached upon execution. The time of each url’s to to end/load might take from 10 secs up to 15mins, each url loads data from database and returns the result as json , and caches the result.
The main point of this script is to cache the resulting data in the morning (00:00 – up to whatever time it’ll take to cache everything), so in the morning people wouldn’t have to wait for the data to cache again.

The urls are api urls. Will curl wait for each execution to end? is this considered bad practice? Up to this point there was no cache, so I’m trying to implement it, cache the most used url’s data for 24’hrs or similar time.

2

Answers


  1. Make sure your script doesn’t time out, so run it from BASH or something, NOT via a server (Apache, NGINX, etc).

    Also: Make sure your curl commands waits long enough, look up curl specs.

    https://unix.stackexchange.com/questions/94604/does-curl-have-a-timeout/94612

    Last: Make sure you do not error out if 1 of the 100 goes bad.

    If you can reasonably satisfy/solve these 3 possible problems, I think you should be fine. (I always send the output to my own mail, to keep an eye on it)

    Login or Signup to reply.
  2. Regarding the curl integration …

    Will curl wait for each execution to end?

    It depends on how you are using the curl library. You have tagged the question with ‘php’ and ‘php-curl’ – so it seems you are accessing curl’s routines from PHP.

    If you are using curl’s easy interface in something like the following manner:

    • initialize an easy handle with $req = curl_init()
    • set URL and other params using curl_setopt()
    • execute (single) request with curl_exec($req)
    • close or reset the request with curl_close($req) or curl_reset($req)

    then, naturally, you will have to wait until each request completes before commencing the next.

    The alternative is to use the multi interface (see below) – which allows multiple requests to operate simultaneously.

    is this considered bad practice?

    If you are sending such a large number of network requests – and each request potentially takes such a long time – I think it is certainly far from ideal. It would be preferable to use curl’s multi interface if at all possible.

    The multi interface

    As curl’s documentation explains, the multi interface (in contrast to the ‘easy’ interface)

    Enable[s] multiple simultaneous transfers in the same thread without making it complicated for the application …

    My PHP is very weak, so – rather than posting a complete example myself – I will instead refer you to PHP’s documentation on the curl_multi_exec() and related functions.

    In brief, though, the idea is that you still initialize your curl handles the same way. (PHP’s documentation doesn’t mention this explicitly, but a plain curl handle is sometimes referred to as an ‘easy’ handle – to differentiate it from a ‘multi’ handle.)

    $req1 = curl_init();
    $req2 = curl_init();
    // Set URL and other options using `curl_setopt(...)`
    

    (I am ommitting all error-checking here for the sake of brevity.)
    Instead of calling curl_exec(...), however, you instead create a multi instance,

    $mh = curl_multi_init();
    

    add the easy handles to your newly-created multi instance,

    curl_multi_add_handle($mh, $req1);
    curl_multi_add_handle($mh, $req2);
    

    and then (instead of calling curl_exec() for a single easy handle) periodically invoke curl_multi_exec(...) in a loop:

    curl_multi_exec($mh, $running);
    

    The $running variable will be updated to indicate if there are requests still ongoing, so – as soon as $running is false – you can exit the loop and wrap up.

    When done, don’t forget to tidy up.

    curl_multi_remove_handle($mh, $req1);
    curl_multi_remove_handle($mh, $req2);
    curl_multi_cleanup($mh);
    

    Optimizing for a large number of requests

    Instead of using distinct variables for each request – as in $req1, $req2, etc. – you could use an array of requests – perhaps loading the relevant URLs from a text file (which I suspect you are doing already).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search