How do you install it? Im using AI lab on podman which can import new models but I haven’t been able to load other models just the ones it comes with. The ones I tried loading just crash when I start the service.
I agree with the suggestion of the other commenters, just wanted to add that I personally run llama.cpp directly with the build in llama-server. For a single-user server this seems to work great and is almost always at the forefront of model support.
Not sure about AI lab (although I also use podman, prefer Llama-swap), but pretty much everything uses llama-cpp under the hood, which usually takes a day or three to setup for a new architecture. Although I seem to recall them being ready for Qwen3.5 day one due to collaboration.
I find giving it a week or so for the dust to settle (even if it ‘works’, best parameters, quantization bugs etc take a while to shake out) unless there’s a huge motivation.
Also benches are more like guidelines than actual rules, best to do your own on your own use cases.
Had a quick squiz at it, and if it meets your needs, I’d just wait, unless you want to get into the lower levels of things (and run linux), llama-swap is just an inference server, you’ll need something like Open WebUI for chat as well etc.
I have no experience with AI lab or what formats it supports but you can download raw model files from Huggingface, or use a tool like Ollama that can pull them (either from its own model repos or from huggingface directly, there is an ollama copy-paste command in huggingface). There are lots of formats but GGUF is usually easiest and a single file.
How do you install it? Im using AI lab on podman which can import new models but I haven’t been able to load other models just the ones it comes with. The ones I tried loading just crash when I start the service.
I agree with the suggestion of the other commenters, just wanted to add that I personally run llama.cpp directly with the build in llama-server. For a single-user server this seems to work great and is almost always at the forefront of model support.
Not sure about AI lab (although I also use podman, prefer Llama-swap), but pretty much everything uses llama-cpp under the hood, which usually takes a day or three to setup for a new architecture. Although I seem to recall them being ready for Qwen3.5 day one due to collaboration.
I find giving it a week or so for the dust to settle (even if it ‘works’, best parameters, quantization bugs etc take a while to shake out) unless there’s a huge motivation.
Also benches are more like guidelines than actual rules, best to do your own on your own use cases.
Okay, thanks for the help! I’ll give llama swap a shot.
Had a quick squiz at it, and if it meets your needs, I’d just wait, unless you want to get into the lower levels of things (and run linux), llama-swap is just an inference server, you’ll need something like Open WebUI for chat as well etc.
I have no experience with AI lab or what formats it supports but you can download raw model files from Huggingface, or use a tool like Ollama that can pull them (either from its own model repos or from huggingface directly, there is an ollama copy-paste command in huggingface). There are lots of formats but GGUF is usually easiest and a single file.