Recently made a post about the 35b MOE. Now the dense 27b variant has been released.


  • SuspciousCarrot78@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 hours ago

    Rules of thumb

    • For a 27B: if you want it to run entirely on your GPU, you will need to use a quantisation that fits + leave room for KV cache. So (for example), if your model GGUF was 10GB, I’d leave another 2GB for kv cache, meaning you’d need 12gb to run it with a reasonable context length. I haven’t looked at the quants for Qwen3.6 27B yet…I imagine the “good baseline” quant is what…12? 15gb?

    Having said that, remember that 1) you can run partially on CPU/GPU 2) use lower quants etc. So, if you have “just” 12GB, a lower quant (I dunno…IQ3_XS?) might get you over the line

    • You can run it however you want :) For someone brand new, the best all in one is Ollama or Jan.ai.

    • Yes. Jan.ai has MCP tooling (I imagine ollama does as well), so you can follow the how-to’s to set that up. Read their docs? What do you need to do with MCP?

    • What you should know: you’ll reach a point where “more parameters = better performance” needs to be balanced against cost and smarter tooling. Don’t be tempted to drop $$$ on something thinking you can just throw money at the problem to make it go away.