I am traning YOLOV5 "L6" model for my important project. I have a so huge dataset contains UAV and drone image, and I need the train with huge input dimension (A few months ago I train "M" model with 640×640 input dimension with RTX 3060) in the model there are several bad performance the some categories detection is really god (Vehice and landing are etc.) but when the job came small objects like human model is stuck and confuse.So I decide to train 1280×1280 input size and one months ago I bought RTX 3090 TI. I am run my code in WSL 2 and its fully configured for DL/ML.
The point is when I run any YOLOV5 model with higher then 640×640 size I am getting below error. In the below example I ran with "M6" model with 8 batch size and 1280×1280 input size and the vram usage is around 12 GB so its not exclusive higher model. Also its look like not generally out of memory error because I tried "L6" model with 16 batch size and 1280×1280 input size. I get vram usage bigger then 24 GB vram usage it was instantly crash with cuda out of memory error and like always it was showing allocatin error.
File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/train.py", line 640, in <module>
main(opt)
File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/train.py", line 529, in main
train(opt.hyp, opt, device, callbacks)
File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/train.py", line 352, in train
results, maps, _ = validate.run(data_dict,
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/val.py", line 198, in run
for batch_i, (im, targets, paths, shapes) in enumerate(pbar):
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/utils/dataloaders.py", line 172, in __iter__
yield next(self.iterator)
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
data = self._next_data()
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 34, in do_one_step
data = pin_memory(data, device)
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 67, in pin_memory
return [pin_memory(sample, device) for sample in data] # Backwards compatibility.
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 67, in <listcomp>
return [pin_memory(sample, device) for sample in data] # Backwards compatibility.
File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 55, in pin_memory
return data.pin_memory(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.```
3
Answers
I solved the problem by returning to using physical Linux. In my case problem is I think WSL because the computer must reserve resources for both Linux and Windows. For this reason, the computing power is more limited.
I can think of few ways to be able to start your training.
It may be related to WSL2. Both not letting you use most of your system’s RAM and also constraining the memory available for one application, as that is one of the known limitations of WSL.
References to the Nvidia WSL guide regarding its limitations, etc:
https://docs.nvidia.com/cuda/wsl-user-guide/index.html
"Pinned system memory (example: System memory that an application makes resident for GPU accesses) availability for applications is limited."
"For example, some deep learning training workloads, depending on the framework, model and dataset size used, can exceed this limit and may not work."
Regarding how to fix this problem. The following thread provides some advice on it:
https://github.com/huggingface/diffusers/issues/807
Setting a higher limit for your system’s RAM on WSL and updating the distribution may help getting a higher use rate of your hardware resources.
Modify the
.wslconfig
file to set a higher amount of system memory and also callwsl --update
to update your Linux distribution within Windows.