I have been recently challenged with an architectural problem. Basically, I developed a Node.js application that fetches three zip files from Census.gov (13 MB, 1.2 MB and 6.7 GB) which takes about 15 to 20 minutes. After the files are downloaded the application unzips these files and extracts needed data to an AWS RDS Database. The issue for me is that this application needs to run only one time each year. What would be the best solution for this kind of task? Also, the zip files are deleted after the processing is done.
4
Answers
I would use a cron job. You can use this website (https://crontab.guru/every-year) to determine the correct settings for the crontab.
This setting will run βAt 00:00 on day-of-month 1 and on Monday in December.β
To run the nodeJS program you simply put node yourcode.js aftewards. So it would look like the code below. Where node is you may need to put the path to the node program, and where yourprogram.js is you simply need to add the path there as well.
Hei, I would give u suggestion. But according what Services do you use. In example if using Google Cloud with Google Scheduller. If using Openshift or another u can use Cronjob. But it worst case configuration I think where u need make some yaml file deployment that need trigger to publisher/subscriber:
The Idea i suggest because the process like that, it best practices if using the Asyncrhrounus process.
Thanks,
I would look into AWS Batch service which can run a scheduled job on an EC2 instance (virtual machine) or Fargate (serverless container runner).
Alternative #2: Use AWS Lambda serverless function to execute a NodeJS script (no need to set up an EC2 Instance or Fargate). Lambda functions can be triggered by EventBridge Rules using cron expressions. With Lambda, you pay for number of executions and the execution time in 1ms increments, however this use case could be covered within the AWS Free Tier Lambda pricing. AWS Free Tier
Alternative #3: You can build a state machine using AWS Step Functions to trigger Lambda functions in steps.
The simple solution is to Schedule AWS Lambda Functions Using CloudWatch Events
So, you will have an AWS lambda function that will download the
.zip
files in the S3 buckets, unzip it and extract the data to database. After that, the same function can empty the S3 buckets.This function will be yearly trigger by CloudWatch Events.
For more information, check out this tutorial here