All the runtimes except Intel ones are llama.cpp Q4KMs, so the Ampere ones aren’t anything special.
…The Intel ones kinda are though. They actually have runtimes for CPU/GPU, and NPU, and AFAIK the CPU ones may be able to use AMX if you are on a server CPU.
It’s still not great for a lot of reasons, but one could do worse.
All the runtimes except Intel ones are llama.cpp Q4KMs, so the Ampere ones aren’t anything special.
…The Intel ones kinda are though. They actually have runtimes for CPU/GPU, and NPU, and AFAIK the CPU ones may be able to use AMX if you are on a server CPU.
It’s still not great for a lot of reasons, but one could do worse.