Docker – Why does running Llama 3.1 70B model underutilises the GPU?
I have deployed Llama 3.1 70B and Llama 3.1 8B on my system and it works perfectly for the 8B model. When I tested it for 70B, it underutilized the GPU and took a lot of time to respond. Here…