The Qwen3.5 models are still the best local models I’ve used, so I’m excited to see how this updated version performs.

    • TheCornCollector@piefed.zipOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 hours ago

      I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~85 token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.

      • fonix232@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        1 hour ago

        Wonder what the wombo-combo of Ryzen AI APU can do with this.

        Time to fire up the trusty 370.

      • venusaur@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 hours ago

        Thanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.

          • venusaur@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 hours ago

            Thanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.

    • Infinite@lemmy.zip
      link
      fedilink
      English
      arrow-up
      5
      ·
      5 hours ago

      Probably 24 GB VRAM and 32-64 GB RAM for minimum specs with 4-bit quantization. This is a beefy boi.