Amazon web services - AWS Batch: GPU process

MpizosDimitris
September 21, 2023
222 views
0 votes
2 Answers

I stragle to make a Batch process to run with GPU in AWS Batch.
I set

Compute environment:
  - Type: Managed
  - Prov. model: EC2
  - Instance type: g4dn.xlarge
  - Status: Valid
  - State: Enabled
  - Min CPU: -
  - Desired CPU: -
  - Max CPU: 256

Job queue:
  - state: Enabled
  - status: Valid
  - priority: 100

Job Definition:
  - status: Active
  - Type: Container
  - Image: (the image that I have in ECR)
  - CPU: 3
  - Memory: 10240
  - N of nodes: -

When I submit a job it always stays at ‘RUNNABLE’.
It is not clear to me where I should look into. Is it permissions related? I have tried with different permissions with no luck.

How to debug it?

Answers

Chosen as BEST ANSWER
- MpizosDimitris
- August 28, 2023 at 10:26 am
- 0 votes
0
Searching internet, there are multiple reason that this might happen. On documentation is not clear where to look into.

You should look at EC2 Auto Scaling Groups. There is an autoscaling group named after the compute environment. All of the errors for starting EC2 instances are in that auto scaling group.

For my case was that I did not have permission to spin a GPU instance.

(Edit)

- SudharsanAravind
- September 21, 2023 at 4:15 am
- 0 votes
0
I faced the same issue with almost the same configuration as yours. By specifying an ECS GPU-optimised AMI inside the compute environment block I got this issue resolved.

To find the ECS GPU-optimised AMI in your region you can use this AWS CLI command (this field is named as imageId, you can refer here for the official docs).
```
aws ssm get-parameters-by-path --path /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended 
                                 --region us-east-2 --output json
```
Replace your region in the above command, and if your profile is not the default profile then add --profile your_profile_name.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – AWS Batch: GPU process

Answers