skip to Main Content

This is currently what I am doing to insert data to sqlite db from a csv file that has ~2,000,000 rows. Is there a more memory efficient way? When I try to insert data using knex.js inside the .on(‘data’ …), I get all sorts of errors in the middle of processing them.

async function insertData(filePath: string, tableName: string) {
  const data: CsvRow[] = [];

  await new Promise<void>((resolve, reject) => {
    fs.createReadStream(filePath)
      .pipe(fastcsv.parse({ headers: true }))
      .on("data", (row: CsvRow) => {
        data.push(row);
      })
      .on("end", () => {
        // Insert any remaining rows in the buffer
        console.log("BATCH INSERTING DATA");

        db.batchInsert(tableName, data, 100)
          .then(() => {
            console.log("Batch Insert Complete!");
            resolve();
          })
          .catch((error) => {
            console.log("Error in batch insert: ", error);
            reject(error);
          });

      })
      .on("error", (error) => {
        console.log("Error in reading csv: ", error);
        reject(error);
      });
  });
}

I’ve tried calling insert inside .on(‘data’ …) but I am getting errors, which I think comes from concurrency issues but not sure

2

Answers


  1. Are you applying to some yc startup and is this a take-home coding challenge? XD I found the solution! use .pause() and .resume() on your csvstream between each batchInsert. That way there aren’t multiple async inserts going off which then clog up your 1 sqlite3 connection.

    Login or Signup to reply.
  2. You’re problem is just logical problem when you triying insert data with event ‘data’ . The process is sequential and insert data to database require more ressoure . The sequential process interrupt database insert an cause error. The solution is just to put your code inside on event ‘end’ is sure . If you like first approach just add a counter and compare to your datas.length

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search