So my relevant hardware is:
GPU - 9070XT
CPU - 9950X3D
RAM - 64GB of DDR5

My problem is that I can’t figure out how to get a local LLM to actually use my GPU, I tried Ollama with Deepseek R1 8b and it kind of vaguely ran while maxing out my CPU and completely ignoring the GPU.

While I’m here model suggestions would be good too, I’m currently looking for 2 use cases.

  • Something I can feed a document too and ask questions about that document (Nvidia used to offer this) To work as a kind of co-GM to quickly reference more obscure rules without having to hunt through the PDF.
  • Something more storytelling oriented that I can use to generate background for throwaway side NPCs when the players innevitably demand their life story after expertly dodging all the NPCs I actually wrote lore for.

Also just an unrelated asside, Deepseek R1 8b seems to just go into an infinite thought loop when you ask it the strawberry question which was kind of funny.

  • Fisch@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    29 days ago

    I have the same GPU and I use koboldcpp with Vulkan as the backend. Works perfectly fine. I have a 12B model and it’s extremely fast. I could probably even fit a bigger model into the VRAM. Using tabbyAPI for EXL2 models didn’t work for me, it always generated gibberish (I tried 2 different models). For context, I’m on Linux, so maybe that’s not an issue on other operating systems.