← back NO. #a9eccb67

NO.: #a9eccb67
Topic: MODELS & API
Source: r/LocalLLaMA
Published: 2026-05-01 12:33:04
Importance: ★ 8/10 — radar 80
Original: www.reddit.com/r/LocalLLaMA/comments/1t0s4qv/gemma431bitdflash_has_been_released/

`gemma-4-31B-it-DFlash` released, but blocked on `llama.cpp` support

Weights are up on Hugging Face, but local testing is still blocked by unmerged llama.cpp PR #22105. Useful only for tracking right now; wait for merge before judging real usability.

[ KEY POINTS ]

The model is already published at huggingface.co/z-lab/gemma-4-31B-it-DFlash, so distribution started before runtime support landed.
Testing is gated by ggml-org/llama.cpp PR #22105; without that merge, local inference flow is effectively blocked.
This is a release you bookmark, not deploy today. The next real checkpoint is PR merge, then compatibility and performance checks.

Originalwww.reddit.com/r/LocalLLaMA/comments/1t0s4qv/gemma431bitdflash_has_been_released/Read original →

// related

#0001
#0001Models & API Simon Willison7 hours ago
`llm-gemini` `0.32` adds `gemini-3.5-flash`
50radar
llm-geminiLLM CLI plugin — calls Gemini models from `llm`
Simon Willison's llm CLI can now call Google's new Flash model through the Gemini plugin. Small update, but useful if your scripts already depend on llm.
- llm-gemini now exposes gemini-3.5-flash, so existing llm CLI workflows can test the model without custom API glue.
- Scope is one model alias in a plugin release. This is practical plumbing, not a new app-building capability by itself.
- Best fit is quick model comparison for summaries, extraction, and generation jobs already wired around Simon Willison's llm ecosystem.
Source: simonwillison.net/2026/May/19/llm-gemini-2/#atom-everythRead original →
50radar
PHOTO
FIG-0011:1
#0002
#0002Models & API Simon Willison8 hours ago
`Gemini 3.5 Flash` ships broadly with a 3-6x price jump
90radar
Google is putting the new default-grade model into Search, Gemini, Android Studio, and the API. The API math changed: high-output features need a fresh margin check.
- Model ID is gemini-3.5-flash; it supports 1,048,576 input tokens and 65,536 output tokens. Strong fit for long-context document flows.
- Pricing is $1.50/M input and $9/M output, 3x 3 Flash Preview and 6x 3.1 Flash-Lite. Flash is no longer the obvious cheap default.
- Interactions API is in beta with server-side history management, echoing the Responses API pattern. Agent backends may get simpler state handling.
- No computer-use feature in this release. If your workflow depends on browser/desktop control, this is a model upgrade, not a full agent-runtime replacement.
- 3.5 Pro is slated for next month, likely pricier. Build model routing now instead of hard-coding one Gemini tier.
Source: simonwillison.net/2026/May/19/gemini-35-flash/#atom-everRead original →
FIG-0021:1
90radar
FIG-0021:1
#0003
#0003Models & API GeekNews10 hours ago
`Gemini 3.5 Flash` targets long-running agents and coding
100radar
Google is pushing the fast tier into frontier-agent territory. Recheck Gemini automation stacks where latency matters and quality was the blocker.
- First Gemini 3.5 model combines frontier-level intelligence with execution ability, aimed at long-running agent and coding tasks.
- Keeps the Flash-series speed profile, so it competes for production flows where Pro-class latency was too expensive.
- Scores 76.2% on Terminal-Bench 2.1 and 1656 Elo on GDPval-AA, beating Gemini 3.1 Pro in the cited benchmarks.
Source: news.hada.io/topic?id=29670Read original →
FIG-0031:1
100radar
FIG-0031:1

`gemma-4-31B-it-DFlash` released, but blocked on `llama.cpp` support

// related

`llm-gemini` `0.32` adds `gemini-3.5-flash`

`Gemini 3.5 Flash` ships broadly with a 3-6x price jump

`Gemini 3.5 Flash` targets long-running agents and coding