I’ve been thinking about adding this to my “Fuck it, I’ll do it myself” / SHTF pile. I have a spare 10-15GB for a good selection of basic articles (across sciences, history, pop culture trivia etc).

https://get.kiwix.org/en/solutions/hotspots/content-bundles/

https://get.kiwix.org/en/solutions/hotspots/imager-service/

There’s something inherently cool about having wikipedia in a box (yes, you’d likely need to refresh it once a year) but I’ve never heard of anyone actually self hosting a Kiwix instance.

  • Domi@lemmy.secnd.me
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 hours ago

    Do you actually train the LLM or use RAG? I have been looking for a local LLM + Wikipedia RAG solution for a while now.

    For now I just have kiwix-serve + searxng doing a simple search but the Kiwix search is…questionable.

    • SuspciousCarrot78@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      5 minutes ago

      Somewhere in my documents, I have a scoped ticket for how to use kiwix as the source for the LLM to pull information directly from, populate its answer and naturally respond to question, without word-vomiting a wiki entry complete. I can dig that up for you; it’s actually why I’m looking at kiwix (back burner project for now).

      PS: You’re aware of what LLM-wiki is? That might suit your purposes better, if your corpus is bespoke and updating. Works nicely.

      https://tinyurl.com/llmwiki