skip to Main Content

I have a table Users with id, this table has a million records or more. So when fetching data from the table I am using batching with offset & limit to reduce load on my server.

My question is if we are executing the below query of fetching the data from the table, simultaneously a query comes up inserting a record in the table. Will there be any consistency issues in data returned. The query will be executed until all the data rows in the table is finished.

Query:
SELECT * FROM Users ORDER BY id LIMIT 10000 OFFSET x;

The query will be run such that all the entries in table is returned.
Sample Code:

async function fetchDataInBatches(model, whereClause, batchSize = 1000) {
  let offset = 0;
  let moreDataAvailable = true;
  let allData = [];
  while (moreDataAvailable) {
    const results = await model.findAll({
      where: whereClause,
      limit: batchSize,
      offset: offset,
      order: [['id', 'ASC']], 
    });
    if (results.length === 0) {
      moreDataAvailable = false; 
      break;
    }
    allData = allData.concat(results);
    offset += batchSize;
  }
  return allData;
}

2

Answers


  1. Assuming you’re using the InnoDB engine, every query is executed within a transaction. This means that it gets a consistent view of the database, changes made concurrently by other clients will not interfere with the results. You can treat the query as using a snapshot of the database.

    If you need consistency across multiple queries, you should use explicit transactions.

    Login or Signup to reply.
  2. Assuming you are asking about the consistency of data between paging calls (be good if you can clarify this), then no, the results will not be consistent as you obtain new pages of data while inserts are going on.

    If your Id column was a sequential value, then all new values would appear in the last page(s). But since you said in staging ground (be good if you could also add this to your question), that your Ids are not sequential, then any inserts are going to move all your pages around.

    There are 3 solutions that I can think of, which one you use would depend on your business case.

    1. Have an OrderBy column that is sequential, then you can ensure new data is always in the last page(s).

    2. Have a WHERE clause which ensures you don’t see new data e.g. where DateCreated <= @DateTimeOfQuery. This means any new data will not appear in your query.

    3. On the request of your first page of data take a snapshot of the table and use that for all further pages. This is a bit more complex and in most cases probably doesn’t give you any advantage over (2).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search