TheCornCollector@piefed.zip to LocalLLaMA@sh.itjust.worksEnglish · edit-27 hours agoQwen3.6-35B-A3B releasedhuggingface.coexternal-linkmessage-square14fedilinkarrow-up126arrow-down12file-text
arrow-up124arrow-down1external-linkQwen3.6-35B-A3B releasedhuggingface.coTheCornCollector@piefed.zip to LocalLLaMA@sh.itjust.worksEnglish · edit-27 hours agomessage-square14fedilinkfile-text
The Qwen3.5 models are still the best local models I’ve used, so I’m excited to see how this updated version performs.
minus-squareTheCornCollector@piefed.zipOPlinkfedilinkEnglisharrow-up5·4 hours agoI’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~85 token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.
minus-squarefonix232@fedia.iolinkfedilinkarrow-up1·1 hour agoWonder what the wombo-combo of Ryzen AI APU can do with this. Time to fire up the trusty 370.
minus-squarevenusaur@lemmy.worldlinkfedilinkEnglisharrow-up2·4 hours agoThanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.
minus-squareJakeroxs@sh.itjust.workslinkfedilinkEnglisharrow-up3·3 hours agoYou would want to wait till smaller models for 3.6 are released, I’d assume it’ll be soon
minus-squarevenusaur@lemmy.worldlinkfedilinkEnglisharrow-up1·2 hours agoThanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.
I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~85 token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.
Wonder what the wombo-combo of Ryzen AI APU can do with this.
Time to fire up the trusty 370.
Thanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.
You would want to wait till smaller models for 3.6 are released, I’d assume it’ll be soon
Thanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.