ollama 0.11.9 Introducing A Nice CPU/GPU Performance Optimization

cm0002@piefed.world · 5 days ago

ollama 0.11.9 Introducing A Nice CPU/GPU Performance Optimization

vividspecter@aussie.zone · 5 days ago

Ollama uses ROCm whereas llama.cpp uses Vulkan compute. Which one will perform better depends on many factors, but Vulkan compute should be easier to setup.

afk_strats@lemmy.world · edit-2 5 days ago

Ollama does use ROCm, however, so does llama.cpp. Vulkan happens to be another available backend supported by llama.cpp.

GitHub: llama.cpp Supported Backends

There is an old PRs which attempted to bring Vulkan support to Ollama - a logical and helpful move, given that the Ollama engine is based on llama.cpp - but the Ollama maintainers weren’t interested.

As for performance vs ROCm, it does fine. Against CUDA, it also does well unless you’re in a mulit-gpu setup. Its magic trick is compatibility. Pretty much everything runs Vulkan. And Vulkan is intecompatible between generations of cards, architectures AND vendors. That’s how I’m running a single PC with Nvidia and AMD cards together

hendrik@palaver.p3x.de · 5 days ago

I think llama.cpp merged ROCm support in 2023 already. It’s called HIP on their Readme, but I’m not super educated on all the acronyms and compute frameworks and instruction sets.

afk_strats@lemmy.world · 5 days ago

ROCm is a software stack which includes a bunch of SDKs and API.

HIP is a subset of ROCm which lets you program on AMD GPUs with focus portability from Nvidia’s CUDA