telexed ~ c / d0221dd8-172radar:70 · agent_toolLIVE
← back
NO.
#d0221dd8
Topic
AGENTS & TOOLS
Source
Hacker News · AI Agent
Published
2026-05-19 12:23:07
Importance
★ 7/10 — radar 70
`Forge` brings local 8B agent workflows near frontier reliability
FIG-0021:1

`Forge` brings local 8B agent workflows near frontier reliability

Guardrails, not bigger weights, drive the result. Retry nudges and error recovery make local always-on agents cheaper to test now.

[ KEY POINTS ]
  1. Ministral 8B reached 99.3% with Forge; Claude Sonnet with guardrails hit 100%, leaving less than 1 point between local and frontier.
  2. Without guardrails, the same comparison flips: local 8B plus framework support beat unguarded Claude Sonnet at 87.2%.
  3. Ablations put most value in retry nudges and error recovery. Disabling retry nudges caused 24-49 point drops.
  4. Serving backend changed outcomes sharply: Mistral-Nemo 12B scored 7% on llama-server native function calling vs 83% on Llamafile prompt mode.
Originalgithub.com/antoinezambelli/forgeRead original →

// related