I’m limited to 24GB of VRAM, and I need pretty large context for my use-case (20k+). I tried “Qwen3-14B-GGUF:Q6_K_XL,” but it doesn’t seem to like calling tools more than a couple times, no matter how I prompt it.

Tried using “SuperThoughts-CoT-14B-16k-o1-QwQ-i1-GGUF:Q6_K” and “DeepSeek-R1-Distill-Qwen-14B-GGUF:Q6_K_L,” but Ollama or LangGraph gives me an error saying these don’t support tool calling.

  • robber@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    Some people on another discussion platform were praising the new Mistral Small models for agentic use. I wasn’t able to try them myself yet, but with 24b params you would certainly fit a quantized version in your 24GB.