I’m trying to download then load a pre-trained model with the gensim downloader to/from a specific location. It looks like I can successfully specify the download location using gensim.downloader.BASE_DIR
but then I can’t figure out how to load from that location.
I’m currently downloading the model using:
import gensim.downloader as api
api.BASE_DIR= 'mnt/project/models'
This has worked so far – I can see the model being downloaded to the correct location – but then I can’t access the model. This code:
model = api.load('glove-twitter-25')
Results in an error:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz'
Is there a way to specify the directory to load the model from?
3
Answers
When an error reads, "FileNotFoundError: [Errno 2] No such file or directory: ‘/mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz’", the 1st thing I’d check is: are you sure the file
/mnt/user/gensim-data/glove-twitter-25/glove-twitter-25.gz
exist?sBut more generally, the
gensim.downloader
:exists chiefly to make it easier to fetch remote resources, & was never envisioned to point at local paths, for which "downloading" is a misnomer
is probably a bad idea anyway, for reasons I’ve presented to the Gensim project in a project Github issue
I think you can and should find the original source of such files, and download the raw data yourself, without any mysterious/unversioned code-shims that weren’t in the software release.
You’ll then have a better idea where the files land, what formats they’re in, and what library routines can further massage them into the formats you need.
From looking at source code
it seems that
gensim.downloader
will try to load from local cache if data file is available. So what you need is done automatically.However, if you want to explicitly specify the location of the model (in word2vec gz format), try this example:
model path for cached model can be obtained by running:
The modern way is to define the gensim models location in environmental variables.
This will create and accordingly update the model directory under
~/Cluster-Analysis
or another location of your choice. Note that this gensim folder is maintained as a Python package, best to maintain it throughgensim.downloader
.