Not sure about AI lab (although I also use podman, prefer Llama-swap), but pretty much everything uses llama-cpp under the hood, which usually takes a day or three to setup for a new architecture. Although I seem to recall them being ready for Qwen3.5 day one due to collaboration.
I find giving it a week or so for the dust to settle (even if it ‘works’, best parameters, quantization bugs etc take a while to shake out) unless there’s a huge motivation.
Also benches are more like guidelines than actual rules, best to do your own on your own use cases.
Had a quick squiz at it, and if it meets your needs, I’d just wait, unless you want to get into the lower levels of things (and run linux), llama-swap is just an inference server, you’ll need something like Open WebUI for chat as well etc.
Not sure about AI lab (although I also use podman, prefer Llama-swap), but pretty much everything uses llama-cpp under the hood, which usually takes a day or three to setup for a new architecture. Although I seem to recall them being ready for Qwen3.5 day one due to collaboration.
I find giving it a week or so for the dust to settle (even if it ‘works’, best parameters, quantization bugs etc take a while to shake out) unless there’s a huge motivation.
Also benches are more like guidelines than actual rules, best to do your own on your own use cases.
Okay, thanks for the help! I’ll give llama swap a shot.
Had a quick squiz at it, and if it meets your needs, I’d just wait, unless you want to get into the lower levels of things (and run linux), llama-swap is just an inference server, you’ll need something like Open WebUI for chat as well etc.