I have alot of JSON objects which seem to be exports of collections exported as one document which contains an array of the documents in the collection.
I am having a problem importing it to Mongodb as the way I am importing it seems to be treating it as one document and exceeding the 16mb limit (some of the files are 140mb)
The structure is:
{
"CollectionName": [
{
...
},
...
{
...
}
]
}
The sub documents in the array have a unique id as a attribute called "id", which im assuming was the original document id before being exported.
I am using PowerShell to execute mongoimport and import the collections. The code I have currently is
$collection = [regex]::Replace($_.FullName, "(?:C:\macship-inbound\d{4}.d{2}.d{2}.d{2}.d{2})", "")
$params = '--db', 'machship',
'--collection', "$collection",
'--type', 'json',
'--file', $_.FullName,
'--batchSize', '100',
'--numInsertionWorkers', '500',
& "C:MongoDBToolsbinmongoimport.exe" @params
I have tried adding –jsonArray to the parameters but that doesn’t work.
I would like to import the json using the "CollectionName" as the collection name in the database, and then the sub documents in the array as each document in the collection.
Is this possible ? Happy to use a different approach or technology, just used powershell as it is easy to add to the task scheduler on the heavily locked down machine I am using.
2
Answers
I ended up asking chatgpt and with some massaging it gave me a workable answer using nodejs
I had to use file stream to parse the json object as JSONparse falls over on a huge string.
here is the code if anyone cares:
Update this line to specify the database you want to store the collection in by replacing 'your_database' with the database name.
const mongoURI = 'mongodb://localhost:27017/your_database';
the code is executed with
Your solution with node.js certainly works, however the performance might be rather poor. With jq is would be a one-liner and should be much faster:
batchSize
is a non documented parameter. I think the default of 1000 is fine. numInsertionWorkers of 500 seem to be exaggerated.