Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

BB84@mander.xyz · edit-2 15 hours ago

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

tristynalxander@mander.xyz · 13 hours ago

I was just about to make a post asking for the best small model after finding out Qwen3-27B was way too slow, so Orthrus-Qwen3-8B looks like a pretty appealing option.

BB84@mander.xyz · edit-2 12 hours ago

They said they’re working on Orthus for Qwen 3.5. It’ll be amazing!

tristynalxander@mander.xyz · 11 hours ago

Yeah, unfortunately it seems this can’t be converted to a llama.cpp compatible format yet, and that’s pretty big a tradeoff right now. Not surprising with how new it is, but we’ll have to wait to combine it with other improvements. Pretty exciting for the future though.

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

GitHub - chiennv2000/orthrus: Fast, lossless LLM inference via dual-view diffusion decoding.