I am running the pytorch code below. I’m running the code in a jupyter notebook. the noebook is running on my ubuntu server. I’m trying to download the llama2-70b-chat model from hugging face. my goal is to download the model weights from hugging face and save them locally on my server, so that I can work with the LLM on my ubuntu server where I have a gpu. does the error message below mean that the gpu ran out of room while it was trying to download the model from hugging face? it doesn’t seem to have filled up the drive the notebook is running on, so it seems like there’s plenty of room on my server. I’m just not sure if it tries to hold all the model weights in memory on the gpu instance. can anyone suggest how to resolve this error so that I can try to work with llama2 model on my server, with my gpu?
code:
from torch import cuda, bfloat16
import transformers
model_id = 'meta-llama/Llama-2-70b-chat-hf'
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'
# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=bfloat16
)
# begin initializing HF items, need auth token for these
# hf_auth = '<YOUR_API_KEY>'
hf_auth = apikey
model_config = transformers.AutoConfig.from_pretrained(
model_id,
use_auth_token=hf_auth
)
model = transformers.AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
config=model_config,
quantization_config=bnb_config,
device_map='auto',
use_auth_token=hf_auth
)
model.save_model('/save_path/')
model.eval()
print(f"Model loaded on {device}")
error:
File ~/anaconda3/envs/LLMenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:544, in http_get(url, temp_file, proxies, resume_size, headers, timeout, max_retries, expected_size)
542 if chunk: # filter out keep-alive new chunks
543 progress.update(len(chunk))
--> 544 temp_file.write(chunk)
546 if expected_size is not None and expected_size != temp_file.tell():
547 raise EnvironmentError(
548 f"Consistency check failed: file should be of size {expected_size} but has size"
549 f" {temp_file.tell()} ({displayed_name}).nWe are sorry for the inconvenience. Please retry download and"
550 " pass `force_download=True, resume_download=False` as argument.nIf the issue persists, please let us"
551 " know by opening an issue on https://github.com/huggingface/huggingface_hub."
552 )
File ~/anaconda3/envs/LLMenv/lib/python3.10/tempfile.py:483, in _TemporaryFileWrapper.__getattr__.<locals>.func_wrapper(*args, **kwargs)
481 @_functools.wraps(func)
482 def func_wrapper(*args, **kwargs):
--> 483 return func(*args, **kwargs)
OSError: [Errno 28] No space left on device
2
Answers
The error tells you that there is no space left on the storage drive (e.g. a hard drive partition). From the name of the variables, it could be the partition where the model is, or maybe the temporary storage partition (check environment variable TMPDIR) to see where).
Anyway, the 70b Llama2 model is huge and you would need very powerful GPUs to make it work; you can check this github issue for a discussion on the hardware requirements for different sizes of the model. Maybe starting with the 7b model is a better idea than going directly for the 70b one.
I had the same issue. You can check the memory of your GPU by running:
nvidia-smi
in your terminal.
You can check the memory of your computer by running:
df -ahl
I hope it helps!