• brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    14 hours ago

    This is why this is a bad idea.

    If you look at the actual source, it is not Deepseek R1:

    …It is the Qwen 2.5 7B Deepseek R1 distil.

    This is a terrible model. It is:

    • Old, obsolete.
    • Not the “real” Deepseek MoE, but misleadingly labeled.
    • Dumb even when it was released.
    • Not optimized for runtime CPU repacking.
    • A “flat” Q4KM without extra weight to sensitive layers because it’s an ancient bartowski quantization.

    This is extremely frustrating. Llama.cpp has about a bajillion different corporate entities developing “easy” front ends for it, all doing it wrong anyway, who don’t even acknowledge the original project. Much less contribute to it.

    What’s more, advertising it as “easy AI” is going to give users bad impressions when they run it, and it’s as slow as molasses and as dumb as dirt.

    The only redeeming quality I can find in this project is that they package the Intel OpenVINO runtimes for CPU/GPU/NPU much more intelligently (eg Intel must have helped them). And this is awesome because OpenVINO is very hard to set up, but that’s about it.


    Hence, the reason basically every ML enthusiast recommends “figure out base llama.cpp” is that no one else downstream gets it right, they mess it up, except maybe kobold.cpp/croco.cpp. They just layer abstraction on top and screw stuff up so they can tell some boss “Look! 1 click AI!”

  • PlanterTree@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    Intel and ARM Ampere systems.

    Does this mean they optimized for CPU instead of GPU? I doubt they target Intel GPUs tbh, so they really optimized for CPU… interesting!

    • brucethemoose@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      14 hours ago

      All the runtimes except Intel ones are llama.cpp Q4KMs, so the Ampere ones aren’t anything special.

      …The Intel ones kinda are though. They actually have runtimes for CPU/GPU, and NPU, and AFAIK the CPU ones may be able to use AMX if you are on a server CPU.

      It’s still not great for a lot of reasons, but one could do worse.

    • suoko@feddit.itOP
      link
      fedilink
      arrow-up
      1
      ·
      15 hours ago

      Brew for Linux? I wonder how many people are using it, I’d rather use appimages instead