Why can't I load from a specific filepath with gensim downloader? - Twitter API

tjlsmith
February 5, 2021
99 views
2 votes
3 Answers

I’m trying to download then load a pre-trained model with the gensim downloader to/from a specific location. It looks like I can successfully specify the download location using gensim.downloader.BASE_DIR but then I can’t figure out how to load from that location.

I’m currently downloading the model using:

import gensim.downloader as api

api.BASE_DIR= 'mnt/project/models'

This has worked so far – I can see the model being downloaded to the correct location – but then I can’t access the model. This code:

model = api.load('glove-twitter-25')

Results in an error:

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz'

Is there a way to specify the directory to load the model from?

Answers

- gojomo
- February 5, 2021 at 3:38 am
- 0 votes
0
When an error reads, "FileNotFoundError: [Errno 2] No such file or directory: ‘/mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz’", the 1st thing I’d check is: are you sure the file /mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz exist?s

But more generally, the gensim.downloader:
1. exists chiefly to make it easier to fetch remote resources, & was never envisioned to point at local paths, for which "downloading" is a misnomer
2. is probably a bad idea anyway, for reasons I’ve presented to the Gensim project in a project Github issue
I think you can and should find the original source of such files, and download the raw data yourself, without any mysterious/unversioned code-shims that weren’t in the software release.

You’ll then have a better idea where the files land, what formats they’re in, and what library routines can further massage them into the formats you need.
Login or Signup to reply.

- PoeDator
- September 26, 2022 at 10:47 am
- 0 votes
0
From looking at source code
it seems that gensim.downloader will try to load from local cache if data file is available. So what you need is done automatically.
However, if you want to explicitly specify the location of the model (in word2vec gz format), try this example:
```
from gensim.models import KeyedVectors
model_path = 'C:\Users\USER\gensim-data\fasttext-wiki-news-subwords-300\fasttext-wiki-news-subwords-300.gz'
model = KeyedVectors.load_word2vec_format(model_path, binary=False)
```
model path for cached model can be obtained by running:
```
MODEL_NAME = "fasttext-wiki-news-subwords-300"
model_path = gensim.downloader.load(MODEL_NAME, return_path=True)`
```
Login or Signup to reply.

- MaciejS
- March 5, 2023 at 1:43 pm
- 0 votes
0
The modern way is to define the gensim models location in environmental variables.
```
import os
os.environ['GENSIM_DATA_DIR'] = '/home/username/Cluster-Analysis/gensim-data'

import gensim
import gensim.downloader
glove_vectors = gensim.downloader.load('word2vec-google-news-300')  
```
This will create and accordingly update the model directory under ~/Cluster-Analysis or another location of your choice. Note that this gensim folder is maintained as a Python package, best to maintain it through gensim.downloader.
```
gensim-data/
┣ word2vec-google-news-300/
┃ ┣ __pycache__/
┃ ┃ ┗ __init__.cpython-310.pyc
┃ ┣ __init__.py
┃ ┗ word2vec-google-news-300.gz
┣ README.md
┗ information.json
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Why can't I load from a specific filepath with gensim downloader? – Twitter API

Answers