- I have finetuned a gemma 7b LLM from HuggingFace using Lora and stored the model as a compressed .tar.gz file.
- I have finetuned locally in sagemaker.
- this is my .tar.gz file structure of finetuned model :
finetuned_gemma/model-00004-of-00004.safetensors
finetuned_gemma/tokenizer_config.json
finetuned_gemma/model.safetensors.index.json
finetuned_gemma/config.json
finetuned_gemma/model-00002-of-00004.safetensors
finetuned_gemma/generation_config.json
finetuned_gemma/special_tokens_map.json
finetuned_gemma/model-00001-of-00004.safetensors
finetuned_gemma/tokenizer.json
finetuned_gemma/code/
finetuned_gemma/code/requirements.txt
finetuned_gemma/code/.ipynb_checkpoints/
finetuned_gemma/code/.ipynb_checkpoints/requirements-checkpoint.txt
finetuned_gemma/code/inference.py
finetuned_gemma/model-00003-of-00004.safetensors
The finetuned model is also stored in aws s3.
How do I now deploy the model as a sagemaker endpoint?
By the way I have used transformers version 4.38.0 as it is the minimum requirement for gemma tokenizer.
I want to know how to deploy it along with the image Uri.
Please help
I tried using sagemaker.huggingfacemodel and then tried deploying it but I’m facing lots of difficulties.
2
Answers
Follow closely this example: Fine-tune Gemma on SageMaker JumpStart.
You could use the SageMaker Large Model Inference (LMI) container supports gemma models.
https://docs.djl.ai/docs/serving/serving/docs/lmi/deployment_guide/deploying-your-endpoint.html#configuration—servingproperties