I have been unable to get any output from the model even waiting 10 minutes. Running on a Azure Notebook with a Compute instance Standard_E4ds_v4, 4 core, 32GB.
Any assistance is appreciated.
Code:
!source activate llm_env
%pip install conda
import conda
%conda install cudatoolkit
%pip install torch
%pip install einops
%pip install accelerate
%pip install transformers==4.27.4
%pip install huggingface-hub
%pip install chardet
%pip install cchardet
from transformers import AutoTokenizer, AutoModelForCausalLM, TFAutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-7b"
rrmodel = AutoModelForCausalLM.from_pretrained(model,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",)
tokenizer = AutoTokenizer.from_pretrained(model)
input_text = "What is a giraffe?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
attention_mask = torch.ones(input_ids.shape)
output = rrmodel.generate(input_ids,
attention_mask=attention_mask,
max_length=2000,
do_sample=True,
pad_token_id = 50256,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,)
#Never goes into this section
print(f"Got output: {output}")
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_text)
2
Answers
The problem i believe was how i prompted the model, It is a text generation model so in my case i was giving a transcript and i had to write in the end "Summary: ".
So this DID NOT work: "Summarize this transcript. Transcript: ..."
This WORKED: "Transcript: .... , Summary: "
Full Working code below:
I tried your code and I changed the max-length to 100 to check its run time.
I create a VM size of 140GB with CPU as below,
I made small changes with your code as below,
Code:
Then, I started running the code in my ML Studio like below,
It almost took 3:30 hrs to run like below,
Output:
It runs successfully as below,
With 100 max-length it took 3:30 hrs, it will take much time for 2000 max-length. Deploy the VM that runs your notebook with higher GPU size.