A lemmy nomad. Wish there was a way to migrate posts and comments from .world to .ml to here… 😪

  • 0 Posts
  • 6 Comments
Joined 3 months ago
cake
Cake day: March 14th, 2025

help-circle

  • Sure, I run OpenWebUI in a docker container from my TrueNAS SCALE home server (it’s one of their standard packages, so basically a 1-click install). From there I’ve configured API use with OpenAI, Gemini, Anthropic and DeepSeek (part of my job involves evaluating the performance of these big models for various in-house tasks), along with pipelines for some of our specific workflows and MCP via mcpo.

    I previously had my ollama installation in another docker container but didn’t like having a big GPU in my NAS box, so I moved it to its own box. I am mostly interested in testing small/tiny models there. I again have Ollama running in a Docker container (just the official Docker image), but this time on a Debian bare-metal server, and I configured another OpenWebUI pipeline to point to that (OpenWebUI lets you select which LLM(s) you want to use on a conversation-by-conversation basis, so there’s no problem having a bunch of them hooked up at the same time).




  • will@lemm.eetoLocalLLaMA@sh.itjust.worksSpecialize LLM
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 month ago

    Making your own embeddings is for RAG. Most base model providers have standardized on OpenAIs embeddings scheme, but there are many ways. Typically you embed a few tokens worth of data at a time and store that in your vector database. This lets your AI later do some vector math (usually cosine similarity search) to see how similar (related) the embeddings are to each other and to what you asked about. There are fine tuning schemes where you make embeddings before the tuning as well but most people today use whatever fine tuning services their base model provider offers, which usually has some layers of abstraction.


  • will@lemm.eetoLocalLLaMA@sh.itjust.worksSpecialize LLM
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 month ago

    The easiest option for a layperson is retrieval augmented generation, or RAG. Basically you encode your books and upload them into a special kind of database and then tell a regular base model LLM to check the data when making an answer. I know ChatGPT has a built in UI for this (and maybe anthropic too) but you can also build something out using Langchain or OpenWebUi and the model of your choice.

    The next step up from there is fine tuning, where you kinda retrain a base model on your books. This is more complex and time consuming but can give more nuanced answers. It’s often done in combination with RAG for particularly large bodies of information.