

Mistral (24B) models are really bad at long context, but this is not always the case. I find that Qwen 32B and Gemma 27B are solid at 32K
It looks like the Harbinger RPG model I’m using (from Latitude Games) is based on Mistral 24B, so maybe it inherits that limitation? I like it in other ways. It was trained on RPG games, which seems to help it for my use case. I did try some general purpose / vanilla models and felt they were not as good at D&D type scenarios.
It looks like Latitude also has a 70B Wayfarer model. Maybe it would do better at bigger contexts. I have several networked machines with 40GB VRAM between all them, and I can just squeak I4Q_XS x 70B into that unholy assembly if I run 24000 context (before the SWA patch, so maybe more now). I will try it! The drawback is speed. 70B models are slow on my setup, about 8 t/s at startup.
I have to correct myself. It appears newer versions of rpc-server have a cache option and you can point them to a locally stored version of the model to avoid the network cost.