When I first got into local LLMs nearly 3 years ago, in mid 2023, the frontier closed models were ofcourse impressively capable.
I then tried my hand on running 7b size local models, primarily one called Zephyr-7b (what happened to these models?? Dolphin anyone??), on my gaming PC with 8GB AMD RX580 GPU. Fair to say it was just a curiosity exercise (in terms of model performance).
Fast forward to this month, I revisit local LLM. (Although I no longer have the gaming PC, cost-of-living-crisis anyone 😫 )
And, the 31b size models look very sufficient. #Qwen has taken the helm in this order. Which is still very expensive to setup locally, although within grasp.
I’m rooting for the edge-computing models now - the ~2b size models. Due to their low footprint, they are practical to run in a SBC 24/7 at home for many people.
But these edge models are the ‘curiosity category’ now.


Probably not; the models they use all tend to be quite lightweight and inexpensive, tbh.
EDIT:
https://proton.me/support/lumo-privacy
Open-source language models
Lumo is powered by open-source large language models (LLMs) which have been optimized by Proton to give you the best answer based on the model most capable of dealing with your request. The models we’re using currently are Nemo, OpenHands 32B, OLMO 2 32B, GPT-OSS 120B, Qwen, Ernie 4.5 VL 28B, Apertus, and Kimi K2. These run exclusively on servers Proton controls so your data is never stored on a third-party platform.
Lumo’s code is open source, meaning anyone can see it’s secure and does what it claims to. We’re constantly improving Lumo with the latest models that give the best user experience.
Quite lightweight swarm for cloud service, barring Kimi K2.
They have been working on this. Only 3 months ago it was pretty terrible. Today it’s almost on par with chatgpt. A bit worse on rag, slower,… good enough for normal use.
I was playing around with a tiny amount earlier today (I use ProtonMail, so I figured why not).
I can’t tell much about it. It seems very…safety theater / personality removed.
Any idea of what models they use now? I get a feeling that the main brain is 14B (based on how it responds to questions / drops nuance).