The Qwen3.5 models are still the best local models I’ve used, so I’m excited to see how this updated version performs.

  • TheCornCollector@piefed.zipOP
    link
    fedilink
    English
    arrow-up
    5
    ·
    4 hours ago

    I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~85 token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.

    • fonix232@fedia.io
      link
      fedilink
      arrow-up
      1
      ·
      1 hour ago

      Wonder what the wombo-combo of Ryzen AI APU can do with this.

      Time to fire up the trusty 370.

    • venusaur@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 hours ago

      Thanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.

        • venusaur@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          Thanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.