• brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    17 hours ago

    All the runtimes except Intel ones are llama.cpp Q4KMs, so the Ampere ones aren’t anything special.

    …The Intel ones kinda are though. They actually have runtimes for CPU/GPU, and NPU, and AFAIK the CPU ones may be able to use AMX if you are on a server CPU.

    It’s still not great for a lot of reasons, but one could do worse.