`Autoharness` lets `Claude Code` tune its own agent harness

A layer above agent setup is emerging: tools that rewrite prompts, scoring, and runtime context, then keep only eval-proven wins. Concrete benchmark lifts make this more than workflow bragging, and the repo is worth skimming if you run custom agents.

[ KEY POINTS ]

On tau2-airline, it reports +40.7% from best-of-N skillbook scoring with an LLM judge; evaluator design is moving into the product surface.
Another +24.1% came from reflector hyperparameter changes like temperature and max subagent calls, so simple harness knobs still have large headroom.
Injecting runtime context on every step added +22.2%; step budget, recent tool calls, and recent results materially changed outcomes.
Flow is minimal: install, point Claude Code at GUIDE.md, propose harness edits, run evals, and keep only score-improving changes.
Because gains are benchmark-specific, the useful takeaway is the loop: mutate harness, eval immediately, and ship only measured improvements.

Originalwww.reddit.com/r/ClaudeAI/comments/1t8cn9y/claude_improved_my_agent_harness_by_407_overnight/Read original →

// related

#0001
#0001Agents & tools GitHub Changelog23 hours ago
GitHub adds REST API auditing for `Copilot` cloud agent repo config
60radar
Repo-level agent settings can now be checked by API instead of manual UI review. Useful for keeping automation permissions visible before cloud agents touch production code.
- New endpoint: Get Copilot cloud agent configuration for a repository, currently in public preview.
- Best fit is policy drift checks across repos: scan whether agent access and configuration match your expected defaults.
- This is governance plumbing, not a coding-speed feature. Worth adopting if Copilot agents run on real repos.
Source: github.blog/changelog/2026-05-18-audit-repository-copiloRead original →
FIG-0011:1
60radar
FIG-0011:1
#0002
#0002Agents & tools GitHub Changelog24 hours ago
`Copilot Spaces API` is now generally available
70radar
Copilot SpacesGitHub Copilot feature — manages task-specific context spaces
Spaces can now be managed from your own apps via API. Useful for wiring repo context into internal tools or repeatable agent workflows.
- The API supports create, read, update, and delete for Spaces, so context setup no longer has to stay inside GitHub UI.
- Good fit for templates: bootstrap a project space per repo, customer, or feature branch and keep agent context consistent.
- This is more automation surface than end-user feature. Value depends on whether Copilot Spaces is already part of the coding workflow.
Source: github.blog/changelog/2026-05-18-copilot-spaces-api-now-Read original →
FIG-0021:1
70radar
FIG-0021:1
#0003
#0003Agents & tools r/ClaudeAIyesterday
11 Claude Habits That Compound Over Daily Use
50radar
The useful part is not prompt tricks, but persistent context: Projects, CLAUDE.md, styles, skills, and subagents. Worth turning into a default setup before long coding sessions.
- Put codebase context, style guides, and prior PRs into Projects once. Re-pasting the same background is pure context tax.
- A custom style like skeptical senior engineer changes review quality by forcing pushback instead of agreeable code comments.
- In Claude Code, CLAUDE.md carries more weight than session prompts. Around 80 lines of project context can remove repeated stack explanations.
- Use cheaper/faster models by task: Sonnet as default, Opus for architecture, Haiku for batch cleanup like tickets or emails.
- Subagents fit parallel chores: run tests, inspect files, or summarize docs while the main coding thread keeps moving.
Source: www.reddit.com/r/ClaudeAI/comments/1tgqnsl/11_claude_thiRead original →
50radar
PHOTO
FIG-0031:1

`Autoharness` lets `Claude Code` tune its own agent harness

// related

GitHub adds REST API auditing for `Copilot` cloud agent repo config

`Copilot Spaces API` is now generally available

11 Claude Habits That Compound Over Daily Use