← back NO. #153a1062

NO.: #153a1062
Topic: OTHER
Source: Hacker News · Show HN AI
Published: 2026-04-29 16:01:51
Importance: ★ 5/10 — radar 50
Original: interfaze.ai/blog/introducing-structured-output-benchmark

`Structured Output Benchmark` targets value-level LLM correctness

Schema-valid JSON still breaks workflows when field values drift or hallucinate. SOB scores schema, types, and value accuracy across text, image, and audio, so it is immediately useful for choosing models for extraction pipelines.

[ KEY POINTS ]

Existing benchmarks like JSONSchemaBench mostly check schema/type compliance, but miss wrong-yet-plausible fields such as shifted dates or reordered arrays.
SOB pairs each sample with a JSON Schema and human-verified ground truth, then grades failures at the field-value level across three modalities.
Rankings split by modality: GLM-4.7 leads text, Gemma-4-31B images, and Gemini-2.5-Flash audio, so one default model is a weak strategy.
Large models do not dominate value accuracy: Qwen3.5-35B, GLM-4.7, and even Phi-4 beat bigger frontier models on some structured tasks.

Originalinterfaze.ai/blog/introducing-structured-output-benchmarkRead original →

// related

#0001
#0001Other GeekNews6 hours ago
`Bambu Studio` Faces Broad AGPLv3 Compliance Challenge
40radar
Bambu Studio3D printing slicer — based on PrusaSlicer
Copyleft obligations can reach bundled dynamic libraries and install info. If you fork AGPLv3 software, partial source drops are not enough.
- AGPLv3 Corresponding Source covers code needed to generate, install, run, and modify the work, not just visible app changes.
- A tightly coupled proprietary networking library may need source disclosure if it is dynamically linked into the modified app.
- Forking strong-copyleft projects for commercial software demands release-process checks before binaries ship.
Source: news.hada.io/topic?id=29694Read original →
FIG-0011:1
40radar
FIG-0011:1
#0002
#0002Other GeekNews6 hours ago
What's New in `Chrome` from Google I/O 2026
50radar
The web is shifting from human clicks to agent-driven browsing and AI-assisted development. Worth tracking before specs turn into defaults.
- Paul Kinlan, who leads Chrome DevRel, framed the last 6 months as a fast reset for web development workflows.
- One axis is preparing sites for agents that browse on users' behalf. Structured, machine-readable UX will matter more.
- Another axis is developer tooling. Chrome is positioning DevTools around AI-assisted debugging and building, not just inspection.
- The source is short and lacks API-level details, so this is a watchlist item rather than something to implement today.
Source: news.hada.io/topic?id=29693Read original →
FIG-0021:1
50radar
FIG-0021:1
#0003
#0003Other GeekNews9 hours ago
JavaScript Debloating: Complexity, Libraries, and the WASM Trade-off
40radar
Small browser UIs can become heavy fast. WebAssembly helps, but async bridging to the JavaScript event loop keeps the payoff situational.
- Nested syntax and callbacks make JavaScript complexity grow quickly; bundle size is often a design outcome, not just tooling noise.
- Small UI surfaces can still pull in many libraries. Dependency defaults deserve review before reaching for another package.
- WebAssembly opens the browser to other languages, but Pyodide-style async event-loop integration adds real coordination cost.
Source: news.hada.io/topic?id=29675Read original →
FIG-0031:1
40radar
FIG-0031:1

`Structured Output Benchmark` targets value-level LLM correctness

// related

`Bambu Studio` Faces Broad AGPLv3 Compliance Challenge

What's New in `Chrome` from Google I/O 2026

JavaScript Debloating: Complexity, Libraries, and the WASM Trade-off