There is an old PRs which attempted to bring Vulkan support to Ollama - a logical and helpful move, given that the Ollama engine is based on llama.cpp - but the Ollama maintainers weren’t interested.
As for performance vs ROCm, it does fine. Against CUDA, it also does well unless you’re in a mulit-gpu setup. Its magic trick is compatibility. Pretty much everything runs Vulkan. And Vulkan is intecompatible between generations of cards, architectures AND vendors. That’s how I’m running a single PC with Nvidia and AMD cards together
I think llama.cpp merged ROCm support in 2023 already. It’s called HIP on their Readme, but I’m not super educated on all the acronyms and compute frameworks and instruction sets.
Ollama does use ROCm, however, so does llama.cpp. Vulkan happens to be another available backend supported by llama.cpp.
GitHub: llama.cpp Supported Backends
There is an old PRs which attempted to bring Vulkan support to Ollama - a logical and helpful move, given that the Ollama engine is based on llama.cpp - but the Ollama maintainers weren’t interested.
As for performance vs ROCm, it does fine. Against CUDA, it also does well unless you’re in a mulit-gpu setup. Its magic trick is compatibility. Pretty much everything runs Vulkan. And Vulkan is intecompatible between generations of cards, architectures AND vendors. That’s how I’m running a single PC with Nvidia and AMD cards together
I think llama.cpp merged ROCm support in 2023 already. It’s called HIP on their Readme, but I’m not super educated on all the acronyms and compute frameworks and instruction sets.
ROCm is a software stack which includes a bunch of SDKs and API.
HIP is a subset of ROCm which lets you program on AMD GPUs with focus portability from Nvidia’s CUDA