telexed ~ c / 006fbbf7-30aradar:70 · agent_toolLIVE
← back
NO.
#006fbbf7
Topic
AGENTS & TOOLS
Source
Hacker News · MRR
Published
2026-05-19 12:23:07
Importance
★ 7/10 — radar 70
`Forge` raises local 8B agent task success from 53% to 99% with guardrails
FIG-0061:1

`Forge` raises local 8B agent task success from 53% to 99% with guardrails

Reliability came from orchestration, not a bigger model. Forge makes local tool-calling viable when cloud agent costs are the bottleneck.

[ KEY POINTS ]
  1. Ministral 8B with Forge hit 99.3% across multi-step workflows; Claude Sonnet with the same guardrails reached 100%.
  2. Without retry handling, error recovery scored 0% across every tested local and frontier model. The missing layer is architectural.
  3. Backend choice changed results sharply: the same Mistral-Nemo 12B weights scored 7% on llama-server native calling and 83% on Llamafile prompt mode.
  4. Ablation points to retry nudges and error recovery as the useful parts. Rescue parsing and context compaction stayed for rare production failures.
Originalgithub.com/antoinezambelli/forgeRead original →

// related