Quick post about a change I made that’s worked out well.

I was using OpenAI API for automations in n8n — email summaries, content drafts, that kind of thing. Was spending ~$40/month.

Switched everything to Ollama running locally. The migration was pretty straightforward since n8n just hits an HTTP endpoint. Changed the URL from api.openai.com to localhost:11434 and updated the request format.

For most tasks (summarization, classification, drafting) the local models are good enough. Complex reasoning is worse but I don’t need that for automation workflows.

Hardware: i7 with 16GB RAM, running Llama 3 8B. Plenty fast for async tasks.

  • TheMightyCat@ani.social
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 hours ago

    I’m running 2x4090, the 35B fits very comfortable in that.

    For large models like the 397B without a ton of money there are several ways, ive seen posts of people using arrays of used 3090s with good results.

    The other option is CPU inference although with current RAM prices that is less cost effective.

    I was looking at maybe an array of Milk-V JUPITER2 since vllm added riscv support which could be very cost effective.