`Qwen3.6-35B-A3B` reaches 24.6% on `Terminal-Bench 2.0`

A smaller open model stack beat several larger agent setups on a hard terminal benchmark. Worth testing for local coding-agent loops, but still benchmark-first evidence.

[ KEY POINTS ]

little-coder x Qwen3.6-35B-A3B scored 24.6% ±3.2, above Gemini 2.5 Pro on Gemini CLI at 19.6%.
It also edged Qwen3-Coder-480B on Terminus 2 at 23.9%, showing scaffold choice can outweigh raw model scale.
Qwen3.5-9B reached 9.2%; sub-10B local models now have measurable, nonzero performance on hard agentic tasks.
This is still a leaderboard signal, not production proof. Try it on repo-specific tasks before replacing API-backed agents.

Originalwww.reddit.com/r/LocalLLaMA/comments/1temio0/qwen3635ba3b_and_9b_are_officially_on_the_public/Read original →

// related

#0001
#0001Agents & tools GitHub Changelog24 hours ago
GitHub adds REST API auditing for `Copilot` cloud agent repo config
60radar
Repo-level agent settings can now be checked by API instead of manual UI review. Useful for keeping automation permissions visible before cloud agents touch production code.
- New endpoint: Get Copilot cloud agent configuration for a repository, currently in public preview.
- Best fit is policy drift checks across repos: scan whether agent access and configuration match your expected defaults.
- This is governance plumbing, not a coding-speed feature. Worth adopting if Copilot agents run on real repos.
Source: github.blog/changelog/2026-05-18-audit-repository-copiloRead original →
FIG-0011:1
60radar
FIG-0011:1
#0002
#0002Agents & tools GitHub Changelogyesterday
`Copilot Spaces API` is now generally available
70radar
Copilot SpacesGitHub Copilot feature — manages task-specific context spaces
Spaces can now be managed from your own apps via API. Useful for wiring repo context into internal tools or repeatable agent workflows.
- The API supports create, read, update, and delete for Spaces, so context setup no longer has to stay inside GitHub UI.
- Good fit for templates: bootstrap a project space per repo, customer, or feature branch and keep agent context consistent.
- This is more automation surface than end-user feature. Value depends on whether Copilot Spaces is already part of the coding workflow.
Source: github.blog/changelog/2026-05-18-copilot-spaces-api-now-Read original →
FIG-0021:1
70radar
FIG-0021:1
#0003
#0003Agents & tools r/ClaudeAIyesterday
11 Claude Habits That Compound Over Daily Use
50radar
The useful part is not prompt tricks, but persistent context: Projects, CLAUDE.md, styles, skills, and subagents. Worth turning into a default setup before long coding sessions.
- Put codebase context, style guides, and prior PRs into Projects once. Re-pasting the same background is pure context tax.
- A custom style like skeptical senior engineer changes review quality by forcing pushback instead of agreeable code comments.
- In Claude Code, CLAUDE.md carries more weight than session prompts. Around 80 lines of project context can remove repeated stack explanations.
- Use cheaper/faster models by task: Sonnet as default, Opus for architecture, Haiku for batch cleanup like tickets or emails.
- Subagents fit parallel chores: run tests, inspect files, or summarize docs while the main coding thread keeps moving.
Source: www.reddit.com/r/ClaudeAI/comments/1tgqnsl/11_claude_thiRead original →
50radar
PHOTO
FIG-0031:1

`Qwen3.6-35B-A3B` reaches **24.6%** on `Terminal-Bench 2.0`

// related

GitHub adds REST API auditing for `Copilot` cloud agent repo config

`Copilot Spaces API` is now generally available

11 Claude Habits That Compound Over Daily Use

`Qwen3.6-35B-A3B` reaches 24.6% on `Terminal-Bench 2.0`