I have data stored in gzip archive folders, every archive contains a big file that includes json in the following format:
{key:value, key:value}
{key:value, key:value}
{key:value, key:value}
I need to import the data to MongoDB. What is the best way to do that? I can’t extract the gzip on my PC as each file (not archived) is about 1950MB.
2
Answers
I’ve imported tens of billions of lines of CSV and JSON to MongoDB in the past year, even from zipped formats. Having tried them all to save precious time, here’s what I would like to recommend:
You can find the mongoimport documentation at: https://www.mongodb.com/docs/database-tools/mongoimport/
If you have a lot of files, you may want to do a for in bash that unzips and passes the filename as an argument to mongoimport.
If you are worried about not having enough disk space you can also delete the unzipped file at the end of each single import of mongoimport.
Hope it helped!
You can unzip the files to STDOUT and pipe the stream into
mongoimport
. Then you don’t need to safe the uncompressed file to your local disk: