• Hexarei@beehaw.org
    link
    fedilink
    arrow-up
    2
    ·
    46 minutes ago

    run a local LLM like Claude!

    Look inside

    “Run ollama”

    Ollama will almost always be slower than running vllm or llama.cpp, nobody should be suggesting it for anything agentic. On most consumer hardware, the availability of llama.cpp’s --cpu-moe flag alone is absurdly good and worth the effort to familiarize yourself with llamacpp instead of ollama.

  • artyom@piefed.social
    link
    fedilink
    English
    arrow-up
    19
    ·
    edit-2
    6 hours ago

    Dear God, please don’t. FF does not want your AI slop bug reports. You people are ruining open source.

  • org@lemmy.org
    link
    fedilink
    arrow-up
    6
    ·
    5 hours ago

    Pretty sure if you have to ask how to do it, you’re not qualified to do it.

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    6 hours ago

    Did you forget the body text? Or is this some bug? Looks like a question here, and like an AI fabricated tutorial in the original version of this cross-post.

  • ZWQbpkzl [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 hours ago

    You’ll have to be more specific with how anthropic is debuging Firefox. There’s many sort of possible setups. In general though, you’ll need

    • an llm model file
    • some openai compatible server, eg lmstudio, llama.cpp, ollama.
    • some sort of client to that server there’s a myriad of options here. OpenCode is the most like Claude. But there’s also more modular programmatic clients, which might suit a long term task
    • the Firefox source code and/or an MCP server via some plugin.

    You’ll also need to know which models your hardware can run. “Smarter” models require more ram. Models can run on both CPUs and GPUs but they run way faster on the GPU, if they fit in the VRAM.

  • etchinghillside@reddthat.com
    link
    fedilink
    arrow-up
    1
    arrow-down
    3
    ·
    5 hours ago

    Props for putting someone together and not burying it in a 20 minute YouTube video.

    My mind initially went to OpenCode - I’m not familiarity lite-cc - any reason you opted for that? Is it just kinder on smaller local models?