skip to Main Content

I’m trying to download then load a pre-trained model with the gensim downloader to/from a specific location. It looks like I can successfully specify the download location using gensim.downloader.BASE_DIR but then I can’t figure out how to load from that location.

I’m currently downloading the model using:

import gensim.downloader as api

api.BASE_DIR= 'mnt/project/models'

This has worked so far – I can see the model being downloaded to the correct location – but then I can’t access the model. This code:

model = api.load('glove-twitter-25')

Results in an error:

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz'

Is there a way to specify the directory to load the model from?

3

Answers


  1. When an error reads, "FileNotFoundError: [Errno 2] No such file or directory: ‘/mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz’", the 1st thing I’d check is: are you sure the file /mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz exist?s

    But more generally, the gensim.downloader:

    1. exists chiefly to make it easier to fetch remote resources, & was never envisioned to point at local paths, for which "downloading" is a misnomer

    2. is probably a bad idea anyway, for reasons I’ve presented to the Gensim project in a project Github issue

    I think you can and should find the original source of such files, and download the raw data yourself, without any mysterious/unversioned code-shims that weren’t in the software release.

    You’ll then have a better idea where the files land, what formats they’re in, and what library routines can further massage them into the formats you need.

    Login or Signup to reply.
  2. From looking at source code
    it seems that gensim.downloader will try to load from local cache if data file is available. So what you need is done automatically.
    However, if you want to explicitly specify the location of the model (in word2vec gz format), try this example:

    from gensim.models import KeyedVectors
    model_path = 'C:\Users\USER\gensim-data\fasttext-wiki-news-subwords-300\fasttext-wiki-news-subwords-300.gz'
    model = KeyedVectors.load_word2vec_format(model_path, binary=False)
    

    model path for cached model can be obtained by running:

    MODEL_NAME = "fasttext-wiki-news-subwords-300"
    model_path = gensim.downloader.load(MODEL_NAME, return_path=True)`
    
    Login or Signup to reply.
  3. The modern way is to define the gensim models location in environmental variables.

    import os
    os.environ['GENSIM_DATA_DIR'] = '/home/username/Cluster-Analysis/gensim-data'
    
    import gensim
    import gensim.downloader
    glove_vectors = gensim.downloader.load('word2vec-google-news-300')  
    

    This will create and accordingly update the model directory under ~/Cluster-Analysis or another location of your choice. Note that this gensim folder is maintained as a Python package, best to maintain it through gensim.downloader.

    gensim-data/
    ┣ word2vec-google-news-300/
    ┃ ┣ __pycache__/
    ┃ ┃ ┗ __init__.cpython-310.pyc
    ┃ ┣ __init__.py
    ┃ ┗ word2vec-google-news-300.gz
    ┣ README.md
    ┗ information.json
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search