I want to host some LLM’s locally and use more advanced models. Since new hardware is out of the question, I think I should be able to pull something off buying some yesteryear equipment on ebay etc. Did anybody attempt such a project? Does it scale horizontally? (I.e. can I connext two boxes to overcome single box slowness?)


For really lightweight models, Qwen3:8b is pretty good for an 8gb graphics card.
Gemma 4:e4b is pretty good too. I usually sun that on my 16gb gpu.
Obviously the little ones aren’t as good as big ones, but you can always rely on real intelligence to fill in the gaps