zai-org/GLM-4.5-Air · Hugging Face

ikt@aussie.zone · edit-2 7 months ago

zai-org/GLM-4.5-Air · Hugging Face

doodlebob@lemmy.world · 7 months ago

I’m just gonna try vllm, seems like ik_llama.cpp doesnt have a quick docker method

brucethemoose@lemmy.world · edit-2 7 months ago

It should work in any generic cuda container, but yeah it’s more of a hobbyist engine. Honestly I just run it raw since it’s dependency free, except for system CUDA.

Vllm absolutely cannot CPU offload AFAIK, but small models will fit in your vram with room to spare.