Import data in gzip archive to mongodb - PhpOut

IdoSegal
November 3, 2022
275 views
0 votes
2 Answers

I have data stored in gzip archive folders, every archive contains a big file that includes json in the following format:

{key:value, key:value}
{key:value, key:value}
{key:value, key:value}

I need to import the data to MongoDB. What is the best way to do that? I can’t extract the gzip on my PC as each file (not archived) is about 1950MB.

Answers

- TommasoVentafridda
- November 3, 2022 at 11:53 pm
- 0 votes
0
I’ve imported tens of billions of lines of CSV and JSON to MongoDB in the past year, even from zipped formats. Having tried them all to save precious time, here’s what I would like to recommend:
- unzip the file
- pass it as an argument to mongoimport
- create the index on the fields you want, but ONLY at the end of the entire data insert process.
You can find the mongoimport documentation at: https://www.mongodb.com/docs/database-tools/mongoimport/

If you have a lot of files, you may want to do a for in bash that unzips and passes the filename as an argument to mongoimport.
If you are worried about not having enough disk space you can also delete the unzipped file at the end of each single import of mongoimport.

Hope it helped!
Login or Signup to reply.

- WernfriedDomscheit
- November 4, 2022 at 9:05 am
- 0 votes
0
You can unzip the files to STDOUT and pipe the stream into mongoimport. Then you don’t need to safe the uncompressed file to your local disk:
```
gunzip --stdout your_file.json.gz | mongoimport --uri=<connection string> --collection=<collection> --db=<database>
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.