To me, Ollama feels like it’s designed to be a developer-first, local LLM server with just enough functionality to get you to a POC, from where you’re intended to use someone else’s compute resources.
llama.cpp actually supports more backends, with continuous performance improvements and support for more models.
It depends on what you mean.
To me, Ollama feels like it’s designed to be a developer-first, local LLM server with just enough functionality to get you to a POC, from where you’re intended to use someone else’s compute resources.
llama.cpp actually supports more backends, with continuous performance improvements and support for more models.