skip to Main Content

I am running the pytorch code below. I’m running the code in a jupyter notebook. the noebook is running on my ubuntu server. I’m trying to download the llama2-70b-chat model from hugging face. my goal is to download the model weights from hugging face and save them locally on my server, so that I can work with the LLM on my ubuntu server where I have a gpu. does the error message below mean that the gpu ran out of room while it was trying to download the model from hugging face? it doesn’t seem to have filled up the drive the notebook is running on, so it seems like there’s plenty of room on my server. I’m just not sure if it tries to hold all the model weights in memory on the gpu instance. can anyone suggest how to resolve this error so that I can try to work with llama2 model on my server, with my gpu?

code:

from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-70b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, need auth token for these
# hf_auth = '<YOUR_API_KEY>'

hf_auth = apikey
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.save_model('/save_path/')

model.eval()
print(f"Model loaded on {device}")

error:

File ~/anaconda3/envs/LLMenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:544, in http_get(url, temp_file, proxies, resume_size, headers, timeout, max_retries, expected_size)
    542     if chunk:  # filter out keep-alive new chunks
    543         progress.update(len(chunk))
--> 544         temp_file.write(chunk)
    546 if expected_size is not None and expected_size != temp_file.tell():
    547     raise EnvironmentError(
    548         f"Consistency check failed: file should be of size {expected_size} but has size"
    549         f" {temp_file.tell()} ({displayed_name}).nWe are sorry for the inconvenience. Please retry download and"
    550         " pass `force_download=True, resume_download=False` as argument.nIf the issue persists, please let us"
    551         " know by opening an issue on https://github.com/huggingface/huggingface_hub."
    552     )

File ~/anaconda3/envs/LLMenv/lib/python3.10/tempfile.py:483, in _TemporaryFileWrapper.__getattr__.<locals>.func_wrapper(*args, **kwargs)
    481 @_functools.wraps(func)
    482 def func_wrapper(*args, **kwargs):
--> 483     return func(*args, **kwargs)

OSError: [Errno 28] No space left on device

2

Answers


  1. The error tells you that there is no space left on the storage drive (e.g. a hard drive partition). From the name of the variables, it could be the partition where the model is, or maybe the temporary storage partition (check environment variable TMPDIR) to see where).

    Anyway, the 70b Llama2 model is huge and you would need very powerful GPUs to make it work; you can check this github issue for a discussion on the hardware requirements for different sizes of the model. Maybe starting with the 7b model is a better idea than going directly for the 70b one.

    Login or Signup to reply.
  2. I had the same issue. You can check the memory of your GPU by running:

    nvidia-smi

    in your terminal.

    You can check the memory of your computer by running:

    df -ahl

    I hope it helps!

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search