skip to Main Content

So I’m creating this scraping application that essentially uses a REST API for several entities on the server. The main functionallity is just:

let buffer = []
for(entity in entities){
   let data = await scrapingProcessForAnEntity(entity);
   buffer.push(data)
}

I oversimplified the script because the scraping process and how it is stored is not relevant, the point is that I have this function scrapingProcessForAnEntity that gets and returns all the information I need in a Promise.
The thing is, since there are a lot of entities, I want to make it so that I can run the process for multiple entities at a time, and once one of the processes is finished, a new one takes its place. I made some tests trying to use an array of Promises and Promise.race() but I can’t figure how to make the finished Promise quit the array. I also can’t just run the process for all entities at once with a Promise.all() because the server is not capable of handling too many requests at once. It should be ok if I limit it to 3~5 entities.

My current implementation is:

let promises = []
let buffer = []
async function run(){
   for(entity in entities){
      addPromise(() => scrapingProcessForAnEntity(entity))
   }
}

async function addPromise(prom){
   if(promises.length >= 3){ //In this example I'm trying to make it run for up to 3 at a time
      await moveQueue()
   }
   promises.push(prom())
}

async function moveQueue(){
   if(promises.length < 3){
      return
   }
   let data = await Promise.race(promises)
   buffer.push(data)
   promises.splice(promises.indexOf(data), 1) //<---- This is where I'm struggling at
   //how can I remove the finished promised from the array? 
}

Adding the data into to the buffer is not directly implemented in the Promise itself because there’s processing involved in it and I’m not sure if adding the data from 2 promises at the same time might cause an issue.

I also implemented a way to clear all the promises on the end. My only struggle is how to find which promise inside of the array has finished so that it can be replaced.

2

Answers


  1. you can use the map method to execute multiple promises .for example

    you can chunk the entities how many you want before making the promises

    // Function to chunk an array into smaller arrays
    function chunkArray(array, chunkSize) {
      const chunks = [];
      for (let i = 0; i < array.length; i += chunkSize) {
        chunks.push(array.slice(i, i + chunkSize));
      }
      return chunks;
    }
    
    // Assuming scrapingProcessForAnEntity returns a Promise
    const scrapingProcessForAnEntity = (query) => {
      // Your asynchronous API call logic here
      return new Promise((resolve, reject) => {
        // Example asynchronous operation (replace with your actual API call)
        setTimeout(() => {
          console.log(`API call for query: ${query}`);
          resolve(`Data for query: ${query}`);
        }, 1000); // Simulated delay of 1 second
      });
    };
    
    const entities = ['entity1', 'entity2', 'entity3', 'entity4', 'entity5', 'entity6'];
    
    // Chunk the entities into arrays of size 3
    const chunkedEntities = chunkArray(entities, 3);
    
    // Map over the chunks and perform asynchronous operations using Promise.all
    const allPromises = chunkedEntities.map(chunk => {
      const promiseQuery = chunk.map(entity => scrapingProcessForAnEntity(entity));
      return Promise.all(promiseQuery);
    });
    
    // Flatten the results if needed (if you want a flat array of fulfilled promises)
    const fullfilledPromises = Promise.all(allPromises.flat());
    
    fullfilledPromises.then(results => {
      console.log('All API calls completed:', results);
    }).catch(error => {
      console.error('Error:', error);
    });

    here allEntities will have the array of promises and Promise.all method will execute it prallelly

    Login or Signup to reply.
  2. use the Promise.all() instead of Promise.race(). This returns a single Promise that resolves when all of the input Promises have resolved.

       let data = await Promise.all(promises)
       buffer.push(data)
       promises.splice(0, 3) //remove the first 3 promises from the array
    

    it ensures that the Promises are all running at the same time, and the array is only spliced after all of the Promises have finished. This also removes the need to track which individual Promise has finished.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search