When I first got into local LLMs nearly 3 years ago, in mid 2023, the frontier closed models were ofcourse impressively capable.

I then tried my hand on running 7b size local models, primarily one called Zephyr-7b (what happened to these models?? Dolphin anyone??), on my gaming PC with 8GB AMD RX580 GPU. Fair to say it was just a curiosity exercise (in terms of model performance).

Fast forward to this month, I revisit local LLM. (Although I no longer have the gaming PC, cost-of-living-crisis anyone 😫 )

And, the 31b size models look very sufficient. #Qwen has taken the helm in this order. Which is still very expensive to setup locally, although within grasp.

I’m rooting for the edge-computing models now - the ~2b size models. Due to their low footprint, they are practical to run in a SBC 24/7 at home for many people.

But these edge models are the ‘curiosity category’ now.

  • NoiseColor @lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    21 hours ago

    For what stuff do you want to use them? I don’t think they come remotely close to today’s commercial models. Maybe for a specific purpose?

    • ntn888@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      20 hours ago

      hey, thanks for your response… yeah that’s what I meant, the 2b models aren’t usable in today’s state, but more practical for everyday use if they work out…

      I actually meant the 31b models are useful for my purpose. I don’t do full-on agentic coding, just interactive chat/prompting. Example, I make good use for making linux shell scripts (as I don’t know howto myself). Currently I use qwen3.5-flash via cloud. It’s as good as the frontier models back then if not better…

      • SuspciousCarrot78@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        16 hours ago

        There are several 3B or less models that are surprisingly good. If you’re talking about a general chat model, you can get a lot of bang for your buck with Qwen3-1.7b. Granite-3B is also quite good (and obedient at tool calls, IIRC).

        My every day driver is an ablit of Qwen3-4B 2507 instruct called Qwen HIVEMIND. I find it excellent…but again…black magic and clever tricks.

        I’ve actually been scoping out the possibility of using ECA.dev and having something cheap / cloud based (say, GPT-5.4 mini) as the “brains” and SERA-8B as the “hands”.

        GPT-5.4 mini is $0.75/M input tokens$4.50/M output tokens…and if it marries up with SERA-8B…well…that could go a long way indeed.

        Small models can be made useful, as part of swarm architecture…but that’s not an apples : apples comparison.

      • NoiseColor @lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        19 hours ago

        I wanted to use smaller models, but then do more work on the “thinking” process. I didn’t come far, because it get so slow with normal hardware and too expensive on dedicated one. Time consuming (I’m also not a programmer) but a fun project, but in the end I just decided to satisfy the privacy angle with protons ai Lumo.

        • inari@piefed.zip
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          18 hours ago

          Proton has AI? Damn, that’s gotta be bleeding their coffers

          • SuspciousCarrot78@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            ·
            edit-2
            16 hours ago

            Probably not; the models they use all tend to be quite lightweight and inexpensive, tbh.

            EDIT:
            https://proton.me/support/lumo-privacy


            Open-source language models

            Lumo is powered by open-source large language models (LLMs) which have been optimized by Proton to give you the best answer based on the model most capable of dealing with your request. The models we’re using currently are Nemo, OpenHands 32B, OLMO 2 32B, GPT-OSS 120B, Qwen, Ernie 4.5 VL 28B, Apertus, and Kimi K2. These run exclusively on servers Proton controls so your data is never stored on a third-party platform.

            Lumo’s code is open source, meaning anyone can see it’s secure and does what it claims to. We’re constantly improving Lumo with the latest models that give the best user experience.


            Quite lightweight swarm for cloud service, barring Kimi K2.

            • NoiseColor @lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              15 hours ago

              They have been working on this. Only 3 months ago it was pretty terrible. Today it’s almost on par with chatgpt. A bit worse on rag, slower,… good enough for normal use.

              • SuspciousCarrot78@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                10 hours ago

                I was playing around with a tiny amount earlier today (I use ProtonMail, so I figured why not).

                I can’t tell much about it. It seems very…safety theater / personality removed.

                Any idea of what models they use now? I get a feeling that the main brain is 14B (based on how it responds to questions / drops nuance).