How to use AWS Sagemaker with newer version of Huggingface Estimator? - Docker

alvas
November 23, 2022
200 views
3 votes
3 Answers

When trying to use Huggingface estimator on sagemaker, Run training on Amazon SageMaker e.g.

# create the Estimator
huggingface_estimator = HuggingFace(
        entry_point='train.py',
        source_dir='./scripts',
        instance_type='ml.p3.2xlarge',
        instance_count=1,
        role=role,
        transformers_version='4.17',
        pytorch_version='1.10',
        py_version='py38',
        hyperparameters = hyperparameters
)

When I tried to increase the version to transformers_version=’4.24′, it throws an error where the maximum version supported is 4.17.

How to use AWS Sagemaker with newer version of Huggingface Estimator?

There’s a note on using newer version for inference on https://discuss.huggingface.co/t/deploying-open-ais-whisper-on-sagemaker/24761/9 but it looks like the way to use it for training with the Huggingface estimator is kind of complicated https://discuss.huggingface.co/t/huggingface-pytorch-versions-on-sagemaker/26315/5?u=alvations and it’s not confirmed that the complicated steps can work.

Answers

- ArunLokanatha
- November 24, 2022 at 12:45 am
- 0 votes
0
You can use the Pytorch estimator and in your source directory place a requirements.txt with Transformers added to it. This will ensure 2 things
1. You can use higher version of pytorch 1.12 (current) compared to 1.10.2 in the huggingface estimator.
2. Install new version of HuggingFace Transformers library.
To achieve this you need to structure your source directory like this

scripts
/train.py
/requirements.txt

and pass the source_dir attribute to the pytorch estimator
```
pt_estimator = PyTorch(
entry_point="train.py",
source_dir="scripts",
role=sagemaker.get_execution_role(),
```
Login or Signup to reply.

- IvanKhvostishkov
- November 28, 2022 at 10:30 am
- 0 votes
0
@alvas,

Amazon SageMaker is a managed service, which means AWS builds and operates the tooling for you, saving your time. In your case, the tooling of interest is an integration of a new version of HuggingFace Transformers library with SageMaker that should be developed, tested and deployed to production. So, this integration is naturally expected to be one or few versions behind the upstream library. But as a benefit, you always get a version of Transformers that is proved to be stable and compatible with SageMaker.

In your case, you want to try the latest version of Transformers in SageMaker, potentially sacrificing the stability and compatibility (v4.24 was released just less than a month ago). As you correctly mentioned, this workflow can be "kind of complicated" and "not confirmed that the complicated steps can work". @Arun Lokanatha suggested the easiest way to try the new version. Indeed, Transformers work with regular PyTorch estimator, but instead of high-level HuggingFace estimator API you now need to use the lower-level PyTorch estimator API. The above-mentioned requirements.txt will look like this:
```
transformers==4.24.0
```
As a drawback, you need to do a little bit more work by yourself, e.g. to figure out what is the minimal version of PyTorch/CUDA libraries required etc. And you’re responsible for testing, securing, and optimizing the integration as appropriate for production grade use, potentially loosing some benefits from utilising SageMaker at its full capability.

If you finally decide to use HuggingFace high-level estimator in production after my explanation, I recommend to take at least these actions:
- See the current list of supported versions in the latest version SageMaker Python SDK directly in its source code (at of today it’s v4.17.0).
- Create or monitor an existing issue asking for a new version support in SageMaker Python SDK, e.g. #3456 for support for Transformers v4.24.0.
I hope this answer is helpful.

Ivan
Login or Signup to reply.

- AbhishekBhardwaj
- December 1, 2022 at 6:41 pm
- 0 votes
0
You can achieve this by

Step-1 : Create a custom ECR Image with required hf version (https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi.html)

Step-2 : Develop your Train.py

Step-3 : : Pass train.py and the new ecr image uri to sagemaker.estimator.
(https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How to use AWS Sagemaker with newer version of Huggingface Estimator? – Docker

Answers