`Forge` Pushes Local 8B Agent Reliability Near Frontier APIs

Guardrails, not bigger weights, drive the jump. The useful takeaway is architectural: retries, recovery, and serving backend choice can matter more than model size.

[ KEY POINTS ]

Ministral 8B with Forge hit 99.3%, versus Claude Sonnet with guardrails at 100% across the reported eval setup.
Without retry nudges, scores dropped 24-49 points. Reliability work belongs in the agent runtime, not only in model selection.
Serving backend changed the same Mistral-Nemo 12B weights from 7% on llama-server native function calling to 83% on Llamafile prompt mode.
Error recovery scored 0% for every tested model without retry logic. Tool agents need explicit recovery paths before production use.

Originalgithub.com/antoinezambelli/forgeRead original →

// related

#0001
#0001Agents & tools GeekNews4 hours ago
GitHub confirms 3,800 repositories compromised via malicious `VS Code` extension
60radar
A single developer workstation became the entry point. VS Code extension trust is now part of supply-chain security, so extension audits are worth doing now.
- About 3,800 internal repositories were affected after one employee installed a trojanized VS Code extension.
- GitHub’s current assessment limits exposure to internal repositories, but compromised developer endpoints can still leak secrets and code context.
- The extension was removed from VS Code Marketplace, infected endpoints were isolated, and incident response started immediately.
- Practical takeaway: review installed IDE extensions, publisher names, permissions, and disable unused tools before they become build-chain risk.
Source: news.hada.io/topic?id=29731Read original →
FIG-0011:1
60radar
FIG-0011:1
#0002
#0002Agents & tools Claude Code Releases5 hours ago
`Claude Code` `v2.1.146` tightens code review and background sessions
50radar
Claude CodeTerminal coding agent — automates code edits with Claude
Small release, but it removes several annoying agent-run failures. Windows, MCP pagination, and multi-agent env handling all get more reliable.
- /simplify is now /code-review with optional effort levels like high, making review intent clearer in repeatable workflows.
- MCP resources/list, resources/templates/list, and prompts/list no longer drop results after page 1. Tooling backed by large MCP servers becomes safer.
- Windows fixes cover pwsh launch failures, terminal strobing, NTFS junction cleanup, and GNOME paste behavior. Cross-platform CLI friction drops.
- CLAUDE_CODE_SUBAGENT_MODEL now reaches child processes in multi-agent sessions. Model routing gets less brittle for delegated coding runs.
- Auto-updater retries transient network failures, and large diff rendering is faster. Not flashy, but daily-use reliability improved.
Source: github.com/anthropics/claude-code/releases/tag/v2.1.146Read original →
FIG-0021:1
50radar
FIG-0021:1
#0003
#0003Agents & tools GeekNews6 hours ago
Google Cloud revamps agent development with `Antigravity 2.0`
70radar
AntigravityAgent dev tool — links local prototypes to cloud execution
Google is packaging local prototyping and cloud deployment into one agent stack. If Managed Agents API removes hosting glue, it is worth tracking now.
- Antigravity 2.0 and Managed Agents API are framed as an integrated dev kit, not separate demos.
- The flow targets local prototyping first, then managed cloud execution. Less custom orchestration if the API is usable.
- The available text is short, so pricing, lock-in, and runtime limits remain unknown. Treat it as watchlist, not migration trigger.
Source: news.hada.io/topic?id=29718Read original →
FIG-0031:1
70radar
FIG-0031:1

`Forge` Pushes Local 8B Agent Reliability Near Frontier APIs

// related

GitHub confirms 3,800 repositories compromised via malicious `VS Code` extension

`Claude Code` `v2.1.146` tightens code review and background sessions

Google Cloud revamps agent development with `Antigravity 2.0`