• vividspecter@aussie.zone
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    5 days ago

    Ollama uses ROCm whereas llama.cpp uses Vulkan compute. Which one will perform better depends on many factors, but Vulkan compute should be easier to setup.

    • afk_strats@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      5 days ago

      Ollama does use ROCm, however, so does llama.cpp. Vulkan happens to be another available backend supported by llama.cpp.

      GitHub: llama.cpp Supported Backends

      There is an old PRs which attempted to bring Vulkan support to Ollama - a logical and helpful move, given that the Ollama engine is based on llama.cpp - but the Ollama maintainers weren’t interested.

      As for performance vs ROCm, it does fine. Against CUDA, it also does well unless you’re in a mulit-gpu setup. Its magic trick is compatibility. Pretty much everything runs Vulkan. And Vulkan is intecompatible between generations of cards, architectures AND vendors. That’s how I’m running a single PC with Nvidia and AMD cards together

      • hendrik@palaver.p3x.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 days ago

        I think llama.cpp merged ROCm support in 2023 already. It’s called HIP on their Readme, but I’m not super educated on all the acronyms and compute frameworks and instruction sets.

        • afk_strats@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          5 days ago

          ROCm is a software stack which includes a bunch of SDKs and API.

          HIP is a subset of ROCm which lets you program on AMD GPUs with focus portability from Nvidia’s CUDA