I’m attempting to consume the Paypal API transaction endpoint.
I want to grab ALL transactions for a given account. This number could potentially be in the 10’s of millions of transactions. For each of these transactions, I need to store it in the database for processing by a queued job. I’ve been trying to figure out the best way to pull this many records with Laravel. Paypal has a max request items limit of 20 per page.
I initially started off with the idea of creating a job when a user gives me their API credentials that gets the first 20 items and processes them, then dispatches a job from the first job that contains the starting index to use. This would loop forever until it errored out. This doesn’t seem to be working well though as it causes a gateway timeout on saving those API credentials and the request to the API eventually times out (before getting all transactions). I should also mention that the total number of transactions is unknown, so chaining doesn’t seem to be the answer as there is no way to know how many jobs to dispatch…
Thoughts? Is getting API data best suited for a job?
3
Answers
You could dispatch the same job at the end of the first job which queries your current database to find the starting index of the transactions for that job.
So even if your job errors out, you could dispatch it again, then it will resume from where it was ended previously
Yes job is way to go . I’m not familiar with paypal api but it’s seems requests are rate limited paypal rate limiting.. you might want to delay your api requests a bit.. also you can make a class to monitor your api requests consumption by tracking the latest requests you made and in the job you can determine when to fire the next request and record it in the database…
My humble advise
please don’t pull all the data your database will get bloated quickly and you’ll need to scale each time you have a new account it’s not easy task.
May be you will need link your app with another data engine like AWS, anyway I think the best idea is creating an APi, pull only the most important data, indexed, and keep the all big data in another endpoint, where you can reach them if you need