`Forge` pushes local LLM tool-calling reliability with guardrail retries

Guardrails, not model size, drive most of the gain. Useful if you want always-on agents without frontier API spend.

[ KEY POINTS ]

Ministral 8B reached 99.3% with Forge; Claude Sonnet with the same layer hit 100%.
Without guardrails, Claude Sonnet scored 87.2%, so orchestration beat raw model strength in this eval.
Retry nudges caused 24-49 point drops when removed; error recovery added about 10 points across tested models.
Backend choice changed results hard: the same Mistral-Nemo 12B weights scored 7% on llama-server vs 83% on Llamafile.

Originalgithub.com/antoinezambelli/forgeRead original →

// related

#0001
#0001Agents & tools GeekNews4 hours ago
GitHub confirms 3,800 repositories compromised via malicious `VS Code` extension
60radar
A single developer workstation became the entry point. VS Code extension trust is now part of supply-chain security, so extension audits are worth doing now.
- About 3,800 internal repositories were affected after one employee installed a trojanized VS Code extension.
- GitHub’s current assessment limits exposure to internal repositories, but compromised developer endpoints can still leak secrets and code context.
- The extension was removed from VS Code Marketplace, infected endpoints were isolated, and incident response started immediately.
- Practical takeaway: review installed IDE extensions, publisher names, permissions, and disable unused tools before they become build-chain risk.
Source: news.hada.io/topic?id=29731Read original →
FIG-0011:1
60radar
FIG-0011:1
#0002
#0002Agents & tools Claude Code Releases5 hours ago
`Claude Code` `v2.1.146` tightens code review and background sessions
50radar
Claude CodeTerminal coding agent — automates code edits with Claude
Small release, but it removes several annoying agent-run failures. Windows, MCP pagination, and multi-agent env handling all get more reliable.
- /simplify is now /code-review with optional effort levels like high, making review intent clearer in repeatable workflows.
- MCP resources/list, resources/templates/list, and prompts/list no longer drop results after page 1. Tooling backed by large MCP servers becomes safer.
- Windows fixes cover pwsh launch failures, terminal strobing, NTFS junction cleanup, and GNOME paste behavior. Cross-platform CLI friction drops.
- CLAUDE_CODE_SUBAGENT_MODEL now reaches child processes in multi-agent sessions. Model routing gets less brittle for delegated coding runs.
- Auto-updater retries transient network failures, and large diff rendering is faster. Not flashy, but daily-use reliability improved.
Source: github.com/anthropics/claude-code/releases/tag/v2.1.146Read original →
FIG-0021:1
50radar
FIG-0021:1
#0003
#0003Agents & tools GeekNews6 hours ago
Google Cloud revamps agent development with `Antigravity 2.0`
70radar
AntigravityAgent dev tool — links local prototypes to cloud execution
Google is packaging local prototyping and cloud deployment into one agent stack. If Managed Agents API removes hosting glue, it is worth tracking now.
- Antigravity 2.0 and Managed Agents API are framed as an integrated dev kit, not separate demos.
- The flow targets local prototyping first, then managed cloud execution. Less custom orchestration if the API is usable.
- The available text is short, so pricing, lock-in, and runtime limits remain unknown. Treat it as watchlist, not migration trigger.
Source: news.hada.io/topic?id=29718Read original →
FIG-0031:1
70radar
FIG-0031:1

`Forge` pushes local LLM tool-calling reliability with guardrail retries

// related

GitHub confirms 3,800 repositories compromised via malicious `VS Code` extension

`Claude Code` `v2.1.146` tightens code review and background sessions

Google Cloud revamps agent development with `Antigravity 2.0`