I have to pull some large amount of data from shopify through api2cart API. I need to pull data for 2/3 years back, which takes hours to pull as we can only pull 250 items per request. With each response, it provides a key to next 250 data pull and so on.
In my backend, I’m pulling data and saving it in a csv file through fs and then I call the api again for next 250 items. it works well on my local machine. the process continues untill all the data are fetched. I can pull years of data and it takes about 2-3 hours to pull about 100k/150k data.
Then I set up a NestJS Microservice and deployed on a digital ocean server. But when I make an API request for long time, after a few time, server gives me a 504 Gateway timeout Error.
Can’t use the setTimeout as there’ no limit to this process. is there any way to keep pulling data for hours or days?
What to do to pull data for hours without any 504 Gateway timeout Error?
2
Answers
Very long running requests are in general error-prone. A connection reset could result in restarting the whole process again. Even though this won’t fix the underlaying problem you described with digital ocean, I think it’s worth you consider a different solution. I recommend to split up your heavy, long-running task into many small tasks and use a queue system.
Nestjs provides a very good documentation using queues and the bull package.
I added a basic example, with two solutions:
Queue consumer
shopify.consumer.ts
Option a) Generate all requests at once and let the queue process them:
shopify.service.ts
Option b) Generate a new queue job after every response
shopify.service.ts
Queues and Bull work well with NestJS, and I would recommend them for these long calls. It is really helpful as you can also retry calls, and add a timeout for failed jobs (to help with 417 errors).