run a local LLM like Claude!
Look inside
“Run ollama”
Ollama will almost always be slower than running vllm or llama.cpp, nobody should be suggesting it for anything agentic. On most consumer hardware, the availability of llama.cpp’s --cpu-moe flag alone is absurdly good and worth the effort to familiarize yourself with llamacpp instead of ollama.
Dear God, please don’t. FF does not want your AI slop bug reports. You people are ruining open source.
Especially from a 7b model
Pretty sure if you have to ask how to do it, you’re not qualified to do it.
LMAO
You’ll have to be more specific with how anthropic is debuging Firefox. There’s many sort of possible setups. In general though, you’ll need
- an llm model file
- some openai compatible server, eg lmstudio, llama.cpp, ollama.
- some sort of client to that server there’s a myriad of options here. OpenCode is the most like Claude. But there’s also more modular programmatic clients, which might suit a long term task
- the Firefox source code and/or an MCP server via some plugin.
You’ll also need to know which models your hardware can run. “Smarter” models require more ram. Models can run on both CPUs and GPUs but they run way faster on the GPU, if they fit in the VRAM.
Did you forget the body text? Or is this some bug? Looks like a question here, and like an AI fabricated tutorial in the original version of this cross-post.
Props for putting someone together and not burying it in a 20 minute YouTube video.
My mind initially went to OpenCode - I’m not familiarity lite-cc - any reason you opted for that? Is it just kinder on smaller local models?



