I have multiple crons set in Django. In each CronJob I have set ALLOW_PARALLEL_RUNS = False
. To run crons I have used linux crontab
like follows :
*/1 * * * * /home/social/centralsystem/venv/bin/python3.6 /home/social/centralsystem/manage.py runcrons
After some times of running (for example after 2 monthes) I see lots of same crons running that make a lot of load on the server. My question is that what causes this happen?
one example of my cron classes is :
class UserTaskingCronJob(CronJobBase):
ALLOW_PARALLEL_RUNS = False
RUN_EVERY_MINS = 5
schedule = Schedule(run_every_mins=RUN_EVERY_MINS)
code = 'user_tasking'
def do(self):
args = {
'telegram': {
'need_recrawl_threshold': 60 * 2,
'count': 100,
},
'newsAgency': {
'need_recrawl_threshold': 10,
'count': 100,
},
'twitter': {
'need_recrawl_threshold': 60 * 4,
'count': 500
},
}
for social_network in ['telegram', 'newsAgency', 'twitter']:
user_queuing(
SOCIAL_USERS_MODEL[social_network],
social_network,
args[social_network]['need_recrawl_threshold'],
args[social_network]['count'],
)
5
Answers
I post my final solution so anyone else can use it.
First of all you should know that because of
django-cron
bug you should not expect it to prevent parallel running of singlecron
. So to prevent parallel running first you should write a separate linuxcrontab
for each of your crons.Secondly you should use some kind of locking to prevent single crons to be run multiple times by crontab. I suggest using
flock
Your cronjob is running every minute.
See here for an explanation
Crons are broken into:
minute
hour
day(month)
month
day(week)
The slash indicates the step value.
In your case, it will execute in steps of 1 minute. i.e every minute.
You have to be careful with django-cron, if you have lots of different tasks running for different periods of time.
runcrons
takes all your cron classes sequentially and runs them sequentially. It also only logs a cron (successful or not) to the database when it’s done. I think django-cron could be improved by saving the cron log at the start already (and checking if there is already a running task), but that would still not exclude overlaps if multiple jobs are run rather than one long one.You are running
runcrons
every minute, so in these cases you’ll run into trouble:In both cases, some tasks will not be logged in time to the database and while they are running, the next
runcrons
command will start them again.To avoid this, do the following:
runcrons
commands with a list of cron classes each, making sure that the total run of a list lasts less than 1 minute, e.g.I suggest you make a lock file for each social network and check is your last collector finished. for example, make
/tmp/telegram.lock
at the start of your code (and stop the job if its exist) and remove it at the end of code. in each time you want to start a new job check is old lock exists or not.Django cron by default uses the cache to maintain state of which jobs are currently in progress, so that it doesn’t execute a task more than once unless specified to run in parallel.
If you have cache setup in your Django app then you don’t need to worry and split the command multiple time in
crontab
For more you can check it here