Tencent recently released a new MoE model with ~80b parameters, 13b of which are active at inference. Seems very promising for people with access to 64 gigs of VRAM.
You must log in or register to comment.
Been trying to play with this in ik_llama.cpp, and it’s a temperamental model. It feels deep fried, like it wants to be smart if it would just stop looping or getting its own think template wrong.
It works great in 24GB VRAM though.