I’m writing a cron job that’ll execute a script that’ll load up to ~ 100 urls, each url has data that will be memcached upon execution. The time of each url’s to to end/load might take from 10 secs up to 15mins, each url loads data from database and returns the result as json , and caches the result.
The main point of this script is to cache the resulting data in the morning (00:00 – up to whatever time it’ll take to cache everything), so in the morning people wouldn’t have to wait for the data to cache again.
The urls are api urls. Will curl wait for each execution to end? is this considered bad practice? Up to this point there was no cache, so I’m trying to implement it, cache the most used url’s data for 24’hrs or similar time.
2
Answers
Make sure your script doesn’t time out, so run it from BASH or something, NOT via a server (Apache, NGINX, etc).
Also: Make sure your curl commands waits long enough, look up curl specs.
https://unix.stackexchange.com/questions/94604/does-curl-have-a-timeout/94612
Last: Make sure you do not error out if 1 of the 100 goes bad.
If you can reasonably satisfy/solve these 3 possible problems, I think you should be fine. (I always send the output to my own mail, to keep an eye on it)
Regarding the
curl
integration …It depends on how you are using the curl library. You have tagged the question with ‘php’ and ‘php-curl’ – so it seems you are accessing curl’s routines from PHP.
If you are using curl’s
easy
interface in something like the following manner:$req = curl_init()
curl_setopt()
curl_exec($req)
curl_close($req)
orcurl_reset($req)
then, naturally, you will have to wait until each request completes before commencing the next.
The alternative is to use the
multi
interface (see below) – which allows multiple requests to operate simultaneously.If you are sending such a large number of network requests – and each request potentially takes such a long time – I think it is certainly far from ideal. It would be preferable to use curl’s multi interface if at all possible.
The
multi
interfaceAs curl’s documentation explains, the multi interface (in contrast to the ‘easy’ interface)
My PHP is very weak, so – rather than posting a complete example myself – I will instead refer you to PHP’s documentation on the
curl_multi_exec()
and related functions.In brief, though, the idea is that you still initialize your curl handles the same way. (PHP’s documentation doesn’t mention this explicitly, but a plain curl handle is sometimes referred to as an ‘easy’ handle – to differentiate it from a ‘multi’ handle.)
(I am ommitting all error-checking here for the sake of brevity.)
Instead of calling
curl_exec(...)
, however, you instead create amulti
instance,add the
easy
handles to your newly-createdmulti
instance,and then (instead of calling
curl_exec()
for a singleeasy
handle) periodically invokecurl_multi_exec(...)
in a loop:The
$running
variable will be updated to indicate if there are requests still ongoing, so – as soon as $running is false – you can exit the loop and wrap up.When done, don’t forget to tidy up.
Optimizing for a large number of requests
Instead of using distinct variables for each request – as in
$req1
,$req2
, etc. – you could use an array of requests – perhaps loading the relevant URLs from a text file (which I suspect you are doing already).