telexed ~ c / caa9cb37-a10radar:50 · agent_toolLIVE
← back
NO.
#caa9cb37
Topic
AGENTS & TOOLS
Source
r/LocalLLaMA
Published
2026-05-16 07:19:25
Importance
★ 5/10 — radar 50

`Qwen3.6-35B-A3B` reaches **24.6%** on `Terminal-Bench 2.0`

A smaller open model stack beat several larger agent setups on a hard terminal benchmark. Worth testing for local coding-agent loops, but still benchmark-first evidence.

[ KEY POINTS ]
  1. little-coder x Qwen3.6-35B-A3B scored 24.6% ±3.2, above Gemini 2.5 Pro on Gemini CLI at 19.6%.
  2. It also edged Qwen3-Coder-480B on Terminus 2 at 23.9%, showing scaffold choice can outweigh raw model scale.
  3. Qwen3.5-9B reached 9.2%; sub-10B local models now have measurable, nonzero performance on hard agentic tasks.
  4. This is still a leaderboard signal, not production proof. Try it on repo-specific tasks before replacing API-backed agents.
Originalwww.reddit.com/r/LocalLLaMA/comments/1temio0/qwen3635ba3b_and_9b_are_officially_on_the_public/Read original →

// related