← back NO. #28bf2b62

NO.: #28bf2b62
Topic: MODELS & API
Source: Simon Willison
Published: 2026-04-27 23:46:56
Importance: ★ 6/10 — radar 60
Original: simonwillison.net/2026/Apr/27/vibevoice/#atom-everything

Microsoft VibeVoice: Open-Source ASR with Built-In Speaker Diarization

This is a practical look at Microsoft's MIT-licensed speech-to-text model, VibeVoice, emphasizing built-in speaker diarization and local execution on Apple Silicon via MLX. For indie developers, the main appeal is owning transcription workflows without API dependency, but the hardware cost is substantial: multi-GB model downloads and very high RAM usage for long-form audio.

[ KEY POINTS ]

Strong indie value: MIT license plus built-in diarization reduces reliance on paid transcription APIs and post-processing pipelines.
Operational cost is the main constraint: the 4-bit MLX model is 5.71GB, the original model is 17.3GB, and observed RAM usage can exceed 30GB and even spike above 60GB.
Long-form audio is viable but needs tuning: the default token limit only covers about 25 minutes, so longer recordings require increasing --max-tokens.
Best fit appears to be privacy-sensitive or cost-sensitive transcription products for developers who already own high-end local hardware.

Originalsimonwillison.net/2026/Apr/27/vibevoice/#atom-everythingRead original →

// related

#0001
#0001Models & API Simon Willison10 hours ago
`llm-gemini` `0.32` adds `gemini-3.5-flash`
50radar
llm-geminiLLM CLI plugin — calls Gemini models from `llm`
Simon Willison's llm CLI can now call Google's new Flash model through the Gemini plugin. Small update, but useful if your scripts already depend on llm.
- llm-gemini now exposes gemini-3.5-flash, so existing llm CLI workflows can test the model without custom API glue.
- Scope is one model alias in a plugin release. This is practical plumbing, not a new app-building capability by itself.
- Best fit is quick model comparison for summaries, extraction, and generation jobs already wired around Simon Willison's llm ecosystem.
Source: simonwillison.net/2026/May/19/llm-gemini-2/#atom-everythRead original →
50radar
PHOTO
FIG-0011:1
#0002
#0002Models & API Simon Willison11 hours ago
`Gemini 3.5 Flash` ships broadly with a 3-6x price jump
90radar
Google is putting the new default-grade model into Search, Gemini, Android Studio, and the API. The API math changed: high-output features need a fresh margin check.
- Model ID is gemini-3.5-flash; it supports 1,048,576 input tokens and 65,536 output tokens. Strong fit for long-context document flows.
- Pricing is $1.50/M input and $9/M output, 3x 3 Flash Preview and 6x 3.1 Flash-Lite. Flash is no longer the obvious cheap default.
- Interactions API is in beta with server-side history management, echoing the Responses API pattern. Agent backends may get simpler state handling.
- No computer-use feature in this release. If your workflow depends on browser/desktop control, this is a model upgrade, not a full agent-runtime replacement.
- 3.5 Pro is slated for next month, likely pricier. Build model routing now instead of hard-coding one Gemini tier.
Source: simonwillison.net/2026/May/19/gemini-35-flash/#atom-everRead original →
FIG-0021:1
90radar
FIG-0021:1
#0003
#0003Models & API GeekNews12 hours ago
`Gemini 3.5 Flash` targets long-running agents and coding
100radar
Google is pushing the fast tier into frontier-agent territory. Recheck Gemini automation stacks where latency matters and quality was the blocker.
- First Gemini 3.5 model combines frontier-level intelligence with execution ability, aimed at long-running agent and coding tasks.
- Keeps the Flash-series speed profile, so it competes for production flows where Pro-class latency was too expensive.
- Scores 76.2% on Terminal-Bench 2.1 and 1656 Elo on GDPval-AA, beating Gemini 3.1 Pro in the cited benchmarks.
Source: news.hada.io/topic?id=29670Read original →
FIG-0031:1
100radar
FIG-0031:1

Microsoft VibeVoice: Open-Source ASR with Built-In Speaker Diarization

// related

`llm-gemini` `0.32` adds `gemini-3.5-flash`

`Gemini 3.5 Flash` ships broadly with a 3-6x price jump

`Gemini 3.5 Flash` targets long-running agents and coding