• Toes♀@ani.social
    link
    fedilink
    English
    arrow-up
    2
    ·
    26 days ago

    Have you tried koboldcpp? I’m curious about the performance metrics between vulkan and opencl mode.

    • Marvin Damschen@mastodon.nuOP
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      26 days ago

      I did not try koboldcpp but gave it a try now.

      I am not sure how to run the OpenCL backend, the wiki says “Vulkan is a newer option that provides a good balance of speed and utility compared to the OpenCL backend.” cli arguments do not mention it.

      The Vulkan backend seems marginally slower than the one of llama.cpp for GLM-4.7-Flash: 19.8 tps with FlashAttention, 29.8 without.
      It seems koboldcpp is a fork of llama.cpp, maybe some Vulkan optimisations have not made it there, yet.
      @localllama