I want to host some LLM’s locally and use more advanced models. Since new hardware is out of the question, I think I should be able to pull something off buying some yesteryear equipment on ebay etc. Did anybody attempt such a project? Does it scale horizontally? (I.e. can I connext two boxes to overcome single box slowness?)


If you can constrain yourself to MoE-based LLMs, they’ll generally deal better from a performance standpoint with not entirely fitting in VRAM better than non-MoE LLMs, as experts may not get loaded into VRAM at all.