Qwen3.6 finally makes my Local LlaMa useful

Last year when Framework announced the Framework Desktop I immediately ordered one. I’d been wanting a new gaming PC, but I’d also been kicking around the idea of running a local LLM. When it finally arrived it worked great for gaming… but there wasn’t much that would run on the AMD hardware from an LLM standpoint. Over the next few months more tools became available, but it was very slow going. I had many long nights where I’d work and work and work and end up right back where I started.

So I got a Claude Code subscription and used it to help me build out my LLM setup. I made a lot of progress, but now I was comparing my local LLM to Claude, and there was no comparison.

Then I started messing with OpenClaw. First with Claude (expensive, fast), then with my local llama.cpp (cheap, frustrating). I didn’t know enough about it, so I used Claude to help me build a custom app around my llama.cpp. That was fun and I learned a lot, but I was spending most of my time chasing bugs instead of actually optimizing anything.

Around that time I heard about Qwen3-Coder-Next, dropped it into llama.cpp, and wow that was a huge step forward. Better direction-following, better tool calls, just better. I felt like my homegrown app was now holding the model back, so I converted over to OpenClaw. Some growing pains, but once things settled I was impressed again.

We built a lot of tooling along the way: a vector database memory system that cleans itself up each night, a filesystem-based context system, speech-to-text and text-to-speech, and a vision model. At this point my local LLM could see me, hear me, speak to me, and remember things about me, and all of it was built to be LLM-agnostic so Claude and my local system could share the same tools.

I was still leaning on Claude heavily for coding, because honestly it’s amazing at it. I decided to give Qwen a small test project: build a web-based kanban board: desktop and mobile friendly. It built it… but it sucked. Drag between columns? Broken. Fixed that, now you can’t add items. Fixed that, dragging broke on mobile. I kept asking Claude to help troubleshoot and it kept just wanting to rewrite the app. Finally I gave in and said “just fix it” and Claude rewrote the whole thing and it was great. I was disheartened. On top of that, Qwen kept getting into these loops, sometimes running for hours doing nothing productive.

So about a week and a half ago I decided to rethink what I even wanted my local LLM to do. Coding was obviously out. I decided to start fresh and use it to help me journal. A few times a day it reaches out, asks what I’m doing, and if it’s relevant, adds an entry to my journal.

I went through a couple more model swaps trying to get it stable, Qwen3.5 was better than Coder-Next for this use case but I was still hitting loop issues. It was consistently prompting me and doing a decent job with the journal, which was at least a step in the right direction.

Then Qwen3.6 dropped. I put the Q6 quant on the same day it released and immediately I could tell it was faster and the output quality was much higher. And I realized earlier today that since I switched to Qwen3.6 I haven’t had to ask Claude to check in on Qwen even once. The looping is gone. It’s actually following the anti-loop protocols I’ve been trying to get models to follow for months.

I haven’t tried coding with it yet (I don’t have high hopes there) but I’ve given it the ability to create and modify its own skills and it’s been doing that beautifully. Scheduled tasks, multiple agents (voice assistant, primary, Home Assistant), all running smoothly.

My reliance on Claude has dropped off sharply since moving to Qwen3.6, and my system resource usage has gone down significantly too. If you’ve tried to get a local LLM setup running and gave up out of frustration… now might be a good time to jump back in, especially if you know your hardware should be able to handle it.

swelter_spark@reddthat.com English

3·

2 months ago

Sounds cool. The most recent Qwen I’ve used is 3. It writes well, but thinking takes like 20 minutes. 😭

Bob Robertson IX @discuss.tchncs.deOP
fedilink
English
arrow-up
3·
2 months ago
I have mine setup to only use thinking for complex tasks, and that can take significantly longer (2-8 minutes for thinking versus <30s for not thinking) but when I look at its thought process it is impressive the logic steps it goes through to solve an issue.