I am facing two scenarios when batch processing a large collection of async functions:
- Collect all the async functions:
import { chunk } from 'lodash';
const getAsyncFunction = async () =>
new Promise((resolve) => {
setTimeout(() => resolve('resolved'), 500);
});
const start = Date.now();
const promises = Array.from({ length: 100 }).map((_i) => getAsyncFunction());
const chunkedPromises = chunk(promises, 25);
for (const batchedPromises of chunkedPromises) {
await Promise.allSettled(batchedPromises);
}
console.log('Time taken: ', Date.now() - start);
(which start executing when pushed to a collection) and then send to a method that will batch these functions and do await Promise.allSettled(batch)
and collate the result per batch!
- Collect a reference to all the async functions:
import { chunk } from 'lodash';
const getAsyncFunction = async () =>
new Promise((resolve) => {
setTimeout(() => resolve('resolved'), 500);
});
const start = Date.now();
const promises = Array.from({ length: 100 }).map((_i) => () => getAsyncFunction());
const chunkedPromises = chunk(promises, 25);
for (const batchedPromises of chunkedPromises) {
await Promise.allSettled(batchedPromises.map(p => p()));
}
console.log('Time taken: ', Date.now() - start);
and then send these to a method that will batch the references and do await Promise.allSettled(batch.map(fn => fn()))
and collate the result per batch!
Note: getAsyncFunction
is for demonstration purpose only. It could be any real world asnyc
function.
Which one of above, if at all, is the correct way to batch process a collection of async functions and why?
2
Answers
EDIT:
As of your last edit your option 2 will take x4 longer. There is no sensible reason to chunk things in option 1, js will execute them all upon main codeblock termination, you will just resolve them 25 at a time.
Second option makes more sense in terms of chunking (next 25 will not be started before first 25 finish), could be useful for making requests in each promise to not max out allowed simultaneous server connections or to otherwise manage load. In js for example to avoid lag between js life cycles (frames).
OLD ANSWER:
There is no definite right or wrong way of doing things in this case, it depends on the specific use-case and what makes more sense or is more readable.
In other languages with multi-threading support it could make huge difference. You would want to start each process as soon as possible, because by the time you will start the last one most of them could be already finished. In Javascript however there would be no measurable difference, since your Promises even tho async, will not run parallel to your main thread. Instead they will be executed after you main code-block termination (true for both Javascript and NodeJS).
Performance or function wise both your options are basically identical, it’s up to preference and what makes most sense for your situation.
Definitely the second.
As you noted yourself, in the first approach all the functions already start executing when you push the promises in the collection (in particular, when you call
getAsyncFunction()
during themap
). There is no actual batching of the execution going on. The only thing this achieves is to create a hazard to crash your application when an error occurs since you are creating promises (that might reject) without immediately attaching an error hander.But even in your second code snippet, the terminology is messed up.
getAsyncFunction
gets you a promise when calling it not an async function, andpromises
is a array of async functions not an array of promises (as arechunkedPromises
andbatchedPromises
). Yes, naming things is hard, but it’s important to be precise, at least about the type of the value, to avoid confusing other programmers (and your future self) who may have to read the code.