I'm excited for dots.llm (142BA14B)!

pebbles@sh.itjust.works · 1 month ago

I'm excited for dots.llm (142BA14B)!

pebbles@sh.itjust.works · 1 month ago

Yes with llamacpp its easy to put just the experts on the CPU. Since only some of the experts are used every time, the GB moved to RAM slows things down way less than moving parts of the model that are used every time. And now parts that are used every time get to stay on the GPU. I was able to get llama4 scout running at around 15 T/s on 96GB RAM and 24GB VRAM with a large context. The whole GGUF was about 80GB.

Also they actually are a Chinese company. I am pretty sure it is the company that makes RedNote (Chinese tiktok) and thats why they had access to so much non-synthetic data. I tried the demo on huggingface and never got any Chinese characters.

I also really enjoyed it’s prose. I think this will be a winner for creative writing.

I'm excited for dots.llm (142BA14B)!

I'm excited for dots.llm (142BA14B)!

GitHub - rednote-hilab/dots.llm1