skip to Main Content

I have alot of JSON objects which seem to be exports of collections exported as one document which contains an array of the documents in the collection.

I am having a problem importing it to Mongodb as the way I am importing it seems to be treating it as one document and exceeding the 16mb limit (some of the files are 140mb)

The structure is:

{
    "CollectionName": [
        {
            ...
        },
        ...
        {
           ...
        }
    ]
}

The sub documents in the array have a unique id as a attribute called "id", which im assuming was the original document id before being exported.

I am using PowerShell to execute mongoimport and import the collections. The code I have currently is

$collection = [regex]::Replace($_.FullName, "(?:C:\macship-inbound\d{4}.d{2}.d{2}.d{2}.d{2})", "")
        $params = '--db', 'machship',
              '--collection', "$collection",
            '--type', 'json',
            '--file', $_.FullName,
            '--batchSize', '100',
            '--numInsertionWorkers', '500',
        & "C:MongoDBToolsbinmongoimport.exe" @params

I have tried adding –jsonArray to the parameters but that doesn’t work.

I would like to import the json using the "CollectionName" as the collection name in the database, and then the sub documents in the array as each document in the collection.

Is this possible ? Happy to use a different approach or technology, just used powershell as it is easy to add to the task scheduler on the heavily locked down machine I am using.

2

Answers


  1. Chosen as BEST ANSWER

    I ended up asking chatgpt and with some massaging it gave me a workable answer using nodejs

    I had to use file stream to parse the json object as JSONparse falls over on a huge string.

    here is the code if anyone cares:

    const fs = require('fs');
    const readline = require('readline');
    const { MongoClient } = require('mongodb');
    
    // MongoDB connection URI
    const mongoURI = 'mongodb://localhost:27017/your_database';
    
    // Function to unpack and insert documents
    async function unpackAndInsert(jsonDocument) {
      const collectionName = Object.keys(jsonDocument)[0];
      const arrayOfDocuments = jsonDocument[collectionName];
    
      const client = new MongoClient(mongoURI, { useNewUrlParser: true, useUnifiedTopology: true });
    
      try {
        await client.connect();
    
        const db = client.db();
    
        // Insert each document into its respective collection
        for (const document of arrayOfDocuments) {
    
          // Insert the updated document into the collection
          await db.collection(collectionName).insertOne(document);
        }
    
        console.log(`Documents inserted into collection: ${collectionName}`);
      } finally {
        await client.close();
      }
    }
    
    // Read the JSON file using a readable stream
    function readAndProcessFile(filePath) {
      const readStream = fs.createReadStream(filePath, { encoding: 'utf8' });
    
      const rl = readline.createInterface({
        input: readStream,
        crlfDelay: Infinity,
      });
    
      let jsonString = '';
    
      rl.on('line', (line) => {
        jsonString += line;
      });
    
      rl.on('close', () => {
        // Parse the accumulated JSON content
        const jsonDocument = JSON.parse(jsonString);
    
        // Unpack and insert documents into MongoDB
        unpackAndInsert(jsonDocument);
      });
    }
    
    // Get the filename from the command-line arguments
    const filename = process.argv[2];
    
    if (!filename) {
      console.error('Please provide a filename as a command-line argument.');
      process.exit(1);
    }
    
    // Call the function to read and process the file
    readAndProcessFile(filename);
    

    Update this line to specify the database you want to store the collection in by replacing 'your_database' with the database name.

    const mongoURI = 'mongodb://localhost:27017/your_database';

    the code is executed with

    node file.js /path/to/json.json
    

  2. Your solution with node.js certainly works, however the performance might be rather poor. With jq is would be a one-liner and should be much faster:

    jq "C:macship-inbound2023-12-08-12-00" '.CollectionName' | mongoimport.exe --db=machship --collection=CollectionName --numInsertionWorkers=5
    

    batchSize is a non documented parameter. I think the default of 1000 is fine. numInsertionWorkers of 500 seem to be exaggerated.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search