Which open models are actually good at agentic coding?

hok@lemmy.dbzer0.com · 22 hours ago

Which open models are actually good at agentic coding?

SuspciousCarrot78@lemmy.world · edit-2 7 hours ago

If your calibration is Codex and Claude, then the answer is basically ‘none’. We’re not there yet. Qwen 3.6 27B is meant to be amazing for coding, but I cannot vouch for, beyond what I have seen on video / read from others.

Outside of that, if you have the compute, you can run GLM5.1, which IS pretty good for this sort of thing. Try either / both via OpenRouter and test.

I think some of the issues surrounding small LLMs can be routed around using strict gates, checkpoints, edit-one-thing-a-time, sort of approaches. You could even use a cloud model as planner and local model as do-er.

I have a theory of how to address small model as coder issues…but that’s probably a different discussion.

TL;DR: Qwen 3.6 27B is the new hotness…but I vote like to cast a vote for something like https://huggingface.co/allenai/SERA-8B-GA or https://huggingface.co/microsoft/FrogMini-14B-2510 as focused agents, co-ordinated by something else

hok@lemmy.dbzer0.com · 7 hours ago

Thank you for your opinion & recommendations. Something I saw today related to “sub-agents” is in Kimi 2.6’s model card it says

Elevated Agent Swarm: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run.

So maybe Kimi 2.6 is doing the “type of thing” I am looking for, but I don’t have the means to run it practically. Maybe at 1 token per second which would be brutal.

I tried out Qwen 3.6 27B but not yet in an agentic setting, so I can’t really judge yet. Maybe it’s just me but the small model size seems limiting. I thought gpt-oss-120b was good.

SuspciousCarrot78@lemmy.world · edit-2 7 hours ago

I suspect you may need to create your own orchestration to achieve the effect you’re after. As I said, I have some ideas…but it’s an engineering proposal, not a drop in replacement.

I’m actually creating my own micro swarm (literally as I type this; waiting for Codex to finish running smoke tests); I have a feeling if you want “Claude at home”, you’re going to have to uplift something like Qwen 3.6 + swarm + harness.

I could pass the idea on to you and you could get Claude to chew through it and see what you two could jury rig?

hok@lemmy.dbzer0.com · 10 minutes ago

Sure, if you have a micro swarm architecture laid out, I would love to hear what it is.