- Does it really “Whip the llama’s ass?”. - Yeah. Give me some skins and crazy visualizations that react to its inner workings. (Edit: Sorry, I’m late to the party.) 
 
- It’s good model, but it still requires 24gb vram. - I’m waiting until something like llama.cpp is made for this. - Not true. See — or actually nothing to be seen here, since “it just works”: https://github.com/ggerganov/llama.cpp/discussions/3368 and https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF - And here is someone describing how to do the quantization yourself: https://advanced-stack.com/resources/running-inference-using-mistral-ai-first-released-model-with-llama-cpp.html - Ooh, thanks. 🤗 
 
- AFAIK Mistral does already work in llama.cpp, or am I misunderstanding something? I’ve yet to try it. 
 
- That’s a great article on a good website : no paywall nor advertising. - What it says about this model is that it’s better than other comparable large language models and it is so because of a great group of searchers (once from Google and from Meta) working on this. 
 They say it is comparatively small at 7 billion parameters. Open source, free to download, free to use, free to tweak yourself.





