skip to Main Content

I have multiple crons set in Django. In each CronJob I have set ALLOW_PARALLEL_RUNS = False. To run crons I have used linux crontab like follows :

*/1 * * * * /home/social/centralsystem/venv/bin/python3.6 /home/social/centralsystem/manage.py runcrons 

After some times of running (for example after 2 monthes) I see lots of same crons running that make a lot of load on the server. My question is that what causes this happen?

one example of my cron classes is :

class UserTaskingCronJob(CronJobBase):
    ALLOW_PARALLEL_RUNS = False
    RUN_EVERY_MINS = 5

    schedule = Schedule(run_every_mins=RUN_EVERY_MINS)
    code = 'user_tasking'

    def do(self):
        args = {
            'telegram': {
                'need_recrawl_threshold': 60 * 2,
                'count': 100,
            },
            'newsAgency': {
                'need_recrawl_threshold': 10,
                'count': 100,
            },
            'twitter': {
                'need_recrawl_threshold': 60 * 4,
                'count': 500
            },
        }
        for social_network in ['telegram', 'newsAgency', 'twitter']:
            user_queuing(
                SOCIAL_USERS_MODEL[social_network],
                social_network,
                args[social_network]['need_recrawl_threshold'],
                args[social_network]['count'],
            )

5

Answers


  1. Chosen as BEST ANSWER

    I post my final solution so anyone else can use it.

    First of all you should know that because of django-cron bug you should not expect it to prevent parallel running of single cron. So to prevent parallel running first you should write a separate linux crontab for each of your crons.

    Secondly you should use some kind of locking to prevent single crons to be run multiple times by crontab. I suggest using flock


  2. Your cronjob is running every minute.

    See here for an explanation

    Crons are broken into:

    minute hour day(month) month day(week)

    The slash indicates the step value.

    In your case, it will execute in steps of 1 minute. i.e every minute.

    */1 * * * *
    
    Login or Signup to reply.
  3. You have to be careful with django-cron, if you have lots of different tasks running for different periods of time. runcrons takes all your cron classes sequentially and runs them sequentially. It also only logs a cron (successful or not) to the database when it’s done. I think django-cron could be improved by saving the cron log at the start already (and checking if there is already a running task), but that would still not exclude overlaps if multiple jobs are run rather than one long one.

    You are running runcrons every minute, so in these cases you’ll run into trouble:

    • If during one of the runs, one of the tasks that needs to be run takes longer than 1 minute to run.
    • If during one of the runs, the total duration of all tasks that need to be run takes longer than 1 minute to run.

    In both cases, some tasks will not be logged in time to the database and while they are running, the next runcrons command will start them again.

    To avoid this, do the following:

    • Identify tasks that take longer than 1 minute to run and run them with a different schedule that ensures they have finished before the next run.
    • In the crontab, run separate runcrons commands with a list of cron classes each, making sure that the total run of a list lasts less than 1 minute, e.g.
    */1 * * * * ./bin/python3.6 manage.py runcrons "my_app.crons.FirstCron" "my_app.crons.SecondCron"
    */1 * * * * ./bin/python3.6 manage.py runcrons "my_app.crons.ThirdCron"
    */10 * * * * ./bin/python3.6 manage.py runcrons "my_app.crons.LongCron"
    
    Login or Signup to reply.
  4. I suggest you make a lock file for each social network and check is your last collector finished. for example, make /tmp/telegram.lock at the start of your code (and stop the job if its exist) and remove it at the end of code. in each time you want to start a new job check is old lock exists or not.

    Login or Signup to reply.
  5. Django cron by default uses the cache to maintain state of which jobs are currently in progress, so that it doesn’t execute a task more than once unless specified to run in parallel.

    If you have cache setup in your Django app then you don’t need to worry and split the command multiple time in crontab

    For more you can check it here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search