I’ve been thinking about adding this to my “Fuck it, I’ll do it myself” / SHTF pile. I have a spare 10-15GB for a good selection of basic articles (across sciences, history, pop culture trivia etc).
https://get.kiwix.org/en/solutions/hotspots/content-bundles/
https://get.kiwix.org/en/solutions/hotspots/imager-service/
There’s something inherently cool about having wikipedia in a box (yes, you’d likely need to refresh it once a year) but I’ve never heard of anyone actually self hosting a Kiwix instance.


Somewhere in my documents, I have a scoped ticket for how to use kiwix as the source for the LLM to pull information directly from, populate its answer organically, and naturally respond to question at hand, without word-vomiting a wiki entry complete. The last I looked, you can poll the kiwix DB directly without using the search engine.
I can dig that up for you if it still exists; it’s actually why I’m looking at kiwix (back burner project for now but the spirit moved me).
PS: You’re aware of LLM-wiki? That might suit your purposes better, if your corpus is bespoke and updating. Works nicely.
https://tinyurl.com/llmwiki