skip to Main Content

I’m attempting to import a large CSV file into mongodb using mongodb compass. The data originally came from BigQuery via GDELT, then was dumped into 40+csv files.
Over half of the files are not able to be imported as they get partially through and then they just stop. Compass stops with the error "Interior hyphen". There appears to be no documentation of why or what this might be.

At import, there are a few csv columns that are specified as numeric but everything else is considered a string, and specified as such in the CSV.

There are documented errors of mongodb issues when the table names use a hyphen, this is not the case here. Has anyone had this issue and solved it?

2

Answers


  1. Chosen as BEST ANSWER

    So while I haven't been able to find an answer. There is a workaround if required. For some reason, when you import these files through pymongo from dataframes, that pipeline does not appear to have any errors. Basically the work around that gets the data where it needs to be is like so:

    path = "/home/linux/Downloads/csvs2import"
    dir_list = os.listdir(path)
    
    
    dtypes = {'CountryCode': str, 
              'date': str,  
              'SQLDATE': str, 
              'ActionGeo_ADM1Code': str, 
              'lat': np.float64, 
              'long': np.float64,
              'URL': str, 
              'sentiment': np.float64, 
              'GoldsteinScale': np.float64, 
              'EventCode': str, 
              'EventBaseCode': str,
              'EventRootCode': str, 
              'QuadClass': str, 
              'Actor1Code': str, 
              'Actor1Name': str,
              'Actor1EthnicCode': str, 
              'Actor1Religion1Code': str, 
              'Actor1Religion2Code': str,
              'Actor1Geo_Fullname': str, 
              'Actor1Type1Code': str, 
              'Actor2Code': str, 
              'Actor2Name': str,
              'Actor2EthnicCode': str, 
              'Actor2Religion1Code': str, 
              'Actor2Religion2Code': str,
              'Actor2Geo_Fullname': str, 
              'Actor2Type1Code': str, 
              'NumSources': np.int32}
    
    for f in dir_list:
        print(f)
        fp = path + '/' + f
        data = pd.read_csv(fp, header=0, dtype=dtypes)
        collection.insert_many(data.to_dict('records'))
        del data
    

    This specifies the csvs, which are in a single folder (those that failed), and then specifies the datatypes, without which it throw errors because things like the event code, which are categorical variables, are ambiguous; so I'm importing them as strings (e.g., 050 is one code, so it is imported as '050' rather than 50).


  2. You must ensure that you are selecting the proper data type during import. In Mongo, there exists both a Timestamp and a BSON native Date type. The Timestamp type is a 64-bit value which is mostly used for internal purposes. The Date type is more commonly used. The error interior hyphen will be returned if you select the Timestamp option when you are importing a timestamp that is of type Date.

    mongodb compass field options

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search