30B-A3B GLM-4.7-Flash Released

TheCornCollector@piefed.zip · 2 months ago

30B-A3B GLM-4.7-Flash Released

FaceDeer@fedia.io · 2 months ago

Oo. I use Qwen3-30B-A3B-Thinking-2507 as my generic “workhorse” local LLM, so this looks like it might be a nice upgrade with exactly the same basic specs. I’ll try it out.

panda_abyss@lemmy.ca · 2 months ago

Anyone get this working in llama.cpp yet?

I know flash attention and PyTorch have patchy support.

TheCornCollector@piefed.zip · 2 months ago

Seems to be a new architecture so custom support is needed.

Tracking issue

PR

panda_abyss@lemmy.ca · edit-2 2 months ago

And that PR is already shipped, the community works fast!

Edit: I tried a 4bit quant of this model and it is probably one of the worst/most benchmaxxed models I’ve seen. Reasoning is quite bad, recall of facts is bad, reading and digesting content is bad. But it is fast.

TheCornCollector@piefed.zip · edit-2 2 months ago

Unfortunately, the AI community prefers rushed buggy development over proper, tested releases, so the quants and maybe the PR weren’t fully working.

As of 3 hours ago, unsloth was still updating their quants and guide. I don’t have time to test now but I wouldn’t judge the base model performance in the first few days when the bugs are still being worked out.

They also recommend some unconventional parameters in the Unsloth guide.

It could also be that the model is truly shit of course.

Edit I just took a look at the llama.cpp repo and there are still issues with the implementation as well.

30B-A3B GLM-4.7-Flash Released

30B-A3B GLM-4.7-Flash Released

zai-org/GLM-4.7-Flash · Hugging Face