telexed ~ cat / model_api★4 and up · hourly · UTC+09LIVE
All Models & API

Models & API

28 items
Today2 dispatches
  • #0028Models & APIGeekNews

    `Qwen3.7-Max`: Agent-First Proprietary Model

    70radar
    Qwen3.7-MaxProprietary LLM — built for long agent runs

    A proprietary model is being positioned for coding, office automation, and very long autonomous runs. Strong benchmark numbers make it worth testing for agent workflows, though API cost and access still decide adoption.

    • Targets coding, debugging, office automation, and hundreds to thousands of autonomous steps; this is agent runtime territory, not simple chat.
    • Scores 69.7 on Terminal Bench 2.0-Terminus and 92.4 on GPQA Diamond; useful signal for coding plus reasoning evals.
    • The reported 35-hour autonomous run matters for long workflows, but real value depends on reliability, tool use, and pricing.
    Source: news.hada.io/topic?id=29716Read original →
  • #0027Models & APIr/LocalLLaMA

    Cohere launches `Command A+`, an Apache 2.0 MoE open-weight model

    80radar
    Command A+Open-weight LLM — Apache 2.0 MoE model

    A practical open-weight model enters the agent stack. Apache 2.0 plus strong quantization makes local or self-hosted experiments cheaper to justify.

    • Command A+ is Cohere’s first MoE model; top-line performance still needs work, but speed and responsiveness are the claimed edge.
    • The model is released under Apache 2.0, so commercial use and product integration have fewer license traps.
    • Quantization is positioned as a core feature: it runs well on 1-2 GPUs, making self-hosted agent backends more realistic.
    • Cohere frames it as the kind of model behind its enterprise agents, not just a benchmark artifact.
    Source: www.reddit.com/r/LocalLLaMA/comments/1tizmar/re_what_eveRead original →
Yesterday10 dispatches
  • #0026Models & APIr/LocalLLaMA

    `Qwen3.7 Max` hits 5th on Artificial Analysis; 27B/35B still pending

    60radar
    Qwen3.7 MaxLarge language model — high-end Alibaba Qwen variant

    Artificial Analysis puts it near GPT 5.4 xhigh and above Gemini 3.5 Flash. Strong benchmark signal, but migration waits on API price and smaller-model results.

    • Ranked 5th on Artificial Analysis, roughly tied with GPT 5.4 xhigh; credible enough to add to model eval lists.
    • Gemini 3.5 Flash sits one step lower in the cited ranking, so latency/price will decide the practical winner.
    • Qwen3.6 27B trails Max by 6 points; the 27B/35B Qwen3.7 results matter for local or cheaper deployment.
    Source: www.reddit.com/r/LocalLLaMA/comments/1tie6gy/qwen37_max_Read original →
  • #0025Models & APIvercel_blog

    `Grok Build 0.1` lands on `Vercel AI Gateway`

    60radar
    Grok Build 0.1Agentic coding model — powers the Grok Build CLI

    A beta agentic coding model is now callable through the AI SDK. Useful for quick experiments, but early access and fixed reasoning keep it from being a default choice.

    • Set the model to xai/grok-build-0.1 in the AI SDK; integration cost is low if the app already uses AI Gateway.
    • Reasoning effort is not configurable, and there is no non-reasoning mode. Latency and cost controls are limited for production agents.
    • AI Gateway adds usage tracking, cost reporting, retries, failover, BYOK, and provider routing around the model call.
    Source: vercel.com/changelog/grok-build-0-1-now-available-on-verRead original →
  • `llm-gemini` `0.32` adds `gemini-3.5-flash`

    50radar
    llm-geminiLLM CLI plugin — calls Gemini models from `llm`

    Simon Willison's llm CLI can now call Google's new Flash model through the Gemini plugin. Small update, but useful if your scripts already depend on llm.

    • llm-gemini now exposes gemini-3.5-flash, so existing llm CLI workflows can test the model without custom API glue.
    • Scope is one model alias in a plugin release. This is practical plumbing, not a new app-building capability by itself.
    • Best fit is quick model comparison for summaries, extraction, and generation jobs already wired around Simon Willison's llm ecosystem.
    Source: simonwillison.net/2026/May/19/llm-gemini-2/#atom-everythRead original →
  • `Gemini 3.5 Flash` ships broadly with a 3-6x price jump

    90radar

    Google is putting the new default-grade model into Search, Gemini, Android Studio, and the API. The API math changed: high-output features need a fresh margin check.

    • Model ID is gemini-3.5-flash; it supports 1,048,576 input tokens and 65,536 output tokens. Strong fit for long-context document flows.
    • Pricing is $1.50/M input and $9/M output, 3x 3 Flash Preview and 6x 3.1 Flash-Lite. Flash is no longer the obvious cheap default.
    • Interactions API is in beta with server-side history management, echoing the Responses API pattern. Agent backends may get simpler state handling.
    • No computer-use feature in this release. If your workflow depends on browser/desktop control, this is a model upgrade, not a full agent-runtime replacement.
    • 3.5 Pro is slated for next month, likely pricier. Build model routing now instead of hard-coding one Gemini tier.
    Source: simonwillison.net/2026/May/19/gemini-35-flash/#atom-everRead original →
  • #0022Models & APIGeekNews

    `Gemini 3.5 Flash` targets long-running agents and coding

    100radar

    Google is pushing the fast tier into frontier-agent territory. Recheck Gemini automation stacks where latency matters and quality was the blocker.

    • First Gemini 3.5 model combines frontier-level intelligence with execution ability, aimed at long-running agent and coding tasks.
    • Keeps the Flash-series speed profile, so it competes for production flows where Pro-class latency was too expensive.
    • Scores 76.2% on Terminal-Bench 2.1 and 1656 Elo on GDPval-AA, beating Gemini 3.1 Pro in the cited benchmarks.
    Source: news.hada.io/topic?id=29670Read original →
  • #0021Models & APIGoogle AI

    Google adds `$100 AI Ultra` tier to `Google AI` subscriptions

    60radar

    Google is expanding paid AI access beyond Plus and Pro. The new high-end tier signals stronger feature gating, but API impact is unclear.

    • AI Ultra is priced at $100; useful only if the included model or product limits beat existing Pro access.
    • Google AI Plus, Pro, and Ultra all get new features and benefits, but the excerpt gives no quota or model details.
    • This is subscription pricing, not confirmed API pricing. Treat it as a product-access signal, not a backend cost change.
    Source: blog.google/products-and-platforms/products/google-one/gRead original →
  • #0020Models & APIGoogle AI

    `Gemini 3.5` released as Google’s action-oriented frontier model line

    100radar

    Google is positioning the new line around intelligence plus action, not just chat. Missing pricing and API details limit immediate planning, but this is a model-release signal to track now.

    • Official Google channel says Gemini 3.5 is the latest model series released at Google I/O. Treat it as a primary launch, not a reaction post.
    • The only stated positioning is frontier intelligence with action. That points toward agent workflows, tool use, and app-side automation.
    • No pricing, rate limits, context window, benchmark, or API migration detail is included here. Implementation decisions need the fuller docs before switching stacks.
    Source: blog.google/innovation-and-ai/models-and-research/geminiRead original →
  • `Gemini 3.5 Flash` Model Page Appears

    70radar

    Only the official developer-doc link is available. Pricing, context, and benchmarks are missing, so wait for API details before switching workloads.

    • The URL points to ai.google.dev model docs, so this is a primary-source signal, not a reaction post.
    • No pricing, rate-limit, context-window, or benchmark data is included. Cost/performance comparison is blocked for now.
    • Treat Gemini 3.5 Flash as a watchlist API candidate. Migration only makes sense after SDK and pricing details are clear.
    Source: blog.google/innovation-and-ai/models-and-research/geminiRead original →
  • `anthropic-sdk-python` `0.103.1` fixes `SessionToolRunner` ownership handling

    40radar
    anthropic-sdk-pythonPython SDK — official client for Anthropic APIs

    A narrow agent-runtime fix prevents skipping the wrong boundary. Upgrade when your app uses SDK-managed tool sessions; otherwise low urgency.

    • SessionToolRunner now skips tool calls it does not own, reducing cross-runner handling bugs in agent workflows.
    • The changelog lists one bug fix only and no API, pricing, or model behavior changes.
    • Pin-sensitive apps should test 0.103.1; simple messages API usage can wait for the next regular bump.
    Source: github.com/anthropics/anthropic-sdk-python/releases/tag/Read original →
  • `Anthropic TypeScript SDK` `v0.97.1` Fixes `SessionToolRunner` Tool-Call Ownership

    40radar
    Anthropic TypeScript SDKLLM API SDK — Anthropic API access from TypeScript

    A narrow runner bug fix prevents SessionToolRunner from handling tool calls it does not own. Upgrade only if your agent flow uses SDK-managed tool sessions.

    • Release date is 2026-05-19. Scope is a patch update from sdk-v0.97.0 to sdk-v0.97.1, not a model or pricing change.
    • The only listed fix is runner: skip tool calls SessionToolRunner does not own; relevant to multi-tool or session-based agent loops.
    • No new API surface is mentioned. If your integration does not use SessionToolRunner, there is little urgency.
    Source: github.com/anthropics/anthropic-sdk-typescript/releases/Read original →
Tue, May 195 dispatches
  • `bedrock-sdk` `v0.29.2` fixes sub-package CI builds

    40radar
    Anthropic Bedrock SDKTypeScript SDK — calls Anthropic models via AWS Bedrock

    A Node type mismatch that broke sub-package builds is fixed. No runtime feature change; update only if your CI touches the AWS Bedrock package path.

    • @types/node is now aligned across sub-packages, removing a CI build failure source in the TypeScript SDK.
    • The changelog only lists bug fixes. No new Anthropic model, API surface, or pricing change ships here.
    • Worth a dependency bump for bedrock-sdk users on strict CI; otherwise this is routine maintenance.
    Source: github.com/anthropics/anthropic-sdk-typescript/releases/Read original →
  • `vertex-sdk` `v0.16.1` fixes TypeScript CI build issue

    40radar
    vertex-sdkAnthropic TypeScript SDK — package for Vertex AI integration

    @types/node is now aligned across sub-packages. No model or API change, but blocked CI builds should clear after upgrading.

    • The only listed fix is aligning @types/node in sub-packages, targeting CI build failures from type-version drift.
    • Release date is 2026-05-19; this is a patch upgrade from vertex-sdk-v0.16.0 to v0.16.1.
    • No pricing, model, or runtime behavior change is mentioned. Upgrade only matters if vertex-sdk builds were failing in CI.
    Source: github.com/anthropics/anthropic-sdk-typescript/releases/Read original →
  • `anthropic-sdk-typescript` `0.97.0` adds self-hosted sandbox helpers

    50radar
    anthropic-sdk-typescriptTypeScript SDK — Anthropic API client package

    Self-hosted sandbox helpers landed for CMA. Useful when you run agent sandboxes yourself; otherwise this is a small maintenance release.

    • client now supports self-hosted sandboxes in CMA. Apps that own their sandbox infra get less glue code to maintain.
    • tsc-multi was upgraded for Node 26 compatibility. New runtime CI pipelines should hit fewer TypeScript build issues.
    • The test cleanup only removes a redundant File import. Runtime impact is concentrated in the sandbox helper addition.
    Source: github.com/anthropics/anthropic-sdk-typescript/releases/Read original →
  • `anthropic-sdk-python` `v0.103.0` adds CMA self-hosted sandbox support

    40radar
    anthropic-sdk-pythonPython SDK — official Anthropic API client

    CMA can now work with self-hosted sandboxes via SDK helpers. Narrow update, but useful when sandbox control matters.

    • The only listed feature is client support for self-hosted sandboxes in CMA, shipped on 2026-05-19.
    • New sandbox helpers reduce custom glue code for teams wiring Anthropic API flows into controlled execution environments.
    • No pricing, model, or migration change is mentioned. Skip unless you already use CMA or sandboxed execution paths.
    Source: github.com/anthropics/anthropic-sdk-python/releases/tag/Read original →
  • `Gemini 3.5 Flash` lands on `Vercel AI Gateway`

    80radar
    Vercel AI GatewayLLM API gateway — routing, cost tracking, and failover

    Better coding, reasoning, and agent loops are now available through one gateway call. Good fit when uptime, cost tracking, and failover matter more than direct provider integration.

    • Use google/gemini-3.5-flash in the AI SDK; migration is mostly a model string change for existing Gateway users.
    • Default thinking level is medium, aiming for faster and cheaper generation without fully dropping reasoning quality.
    • temperature, topP, topK, and thinking_budget are unsupported, so apps relying on sampling controls need compatibility checks.
    • AI Gateway adds usage tracking, cost reporting, retries, failover, BYOK, and provider routing on top of the model call.
    Source: vercel.com/changelog/gemini-3-5-flash-on-ai-gatewayRead original →
Sat, May 162 dispatches
  • `Gemini 3.5` targets complex agentic workflows

    100radar

    Google positions the new model around action, not just chat. The practical test is whether it can reduce multi-step workflow glue now.

    • Official google_deepmind launch signal, not a reaction post; treat it as a primary model-release item.
    • The only stated capability is complex, agentic workflows; no pricing, API shape, benchmarks, or context limits are provided here.
    • Evaluation should start with real tasks: repo edits, tool calls, long plans, and handoff reliability before product migration.
    Source: deepmind.google/blog/gemini-3-5-frontier-intelligence-wiRead original →
  • `openai-python` `v2.37.0` adds `service_tier` to compact responses

    40radar

    More control landed in the Python SDK's compact responses path, plus a small auth cleanup. Useful if you already depend on responses; otherwise this is a low-priority bump.

    • service_tier now reaches the compact responses method, so latency/cost routing can stay consistent without dropping to a lower-level call.
    • Pydantic iterators get eager validation support in internal types, which should catch bad streamed data earlier in typed pipelines.
    • Workload identity auth no longer requires an unnecessary client_id; fewer auth edge cases for cloud deployments using federated identity.
    Source: github.com/openai/openai-python/releases/tag/v2.37.0Read original →
Thu, May 143 dispatches
  • #0009Models & APIGeekNews

    Claude shifts programmatic usage to monthly credits

    70radar

    Programmatic usage moves into a separate monthly credit pool on June 15. Easier to track Agent SDK automation, but the real per-task cost may change.

    • Coverage includes Claude Agent SDK, claude -p, Claude Code GitHub Actions, and third-party apps built on the SDK. Billing changes across the whole automation stack at once.
    • The date is locked to June 15, so any product with heavy CI, batch jobs, or agent runs should check current burn patterns before the switch.
    • Separating chat-style plan usage from programmatic calls makes budgeting cleaner. It also exposes whether the current paid plan still makes economic sense.
    • Third-party SDK wrappers are included too, so this does not get bypassed by using another tool layer. Hidden automation spend becomes easier to spot.
    Source: news.hada.io/topic?id=29494Read original →
  • `anthropic-sdk-python` `v0.102.0` adds managed agent search block and cache diagnostics beta

    50radar
    anthropic-sdk-pythonPython SDK — typed access to Anthropic API features

    The Python SDK now exposes a managed-agents search result block and a beta for cache diagnostics. Useful if you ship against Anthropic APIs directly; otherwise this is a low-priority dependency bump.

    • Adds BetaManagedAgentsSearchResultBlock types, so managed-agent search responses can be handled without custom typing glue.
    • Introduces cache diagnostics beta support, which helps inspect cache behavior when tuning prompt reuse and token spend.
    • pydantic iterator validation is now eager, reducing the chance of delayed type errors surfacing deep in runtime paths.
    • No pricing, model, or API-surface reset here; this is a small SDK upgrade rather than a platform-level change.
    Source: github.com/anthropics/anthropic-sdk-python/releases/tag/Read original →
  • `anthropic-sdk-typescript` `v0.96.0` adds managed-agent search types and cache diagnostics beta

    50radar
    anthropic-sdk-typescriptTypeScript SDK — typed Anthropic API support

    Type coverage moved closer to newer agent and caching surfaces, so less hand-rolled typing is needed in TS apps. Useful if you ship on Anthropic now; otherwise this is a small maintenance upgrade.

    • Adds BetaManagedAgentsSearchResultBlock types, which reduces custom typing around managed-agent search responses.
    • Introduces support for cache diagnostics beta, giving earlier access to visibility on cache behavior before broader API tooling lands.
    • zod handling now locks to zod/v4 types only, which should cut mixed-version type friction in strict TypeScript setups.
    Source: github.com/anthropics/anthropic-sdk-typescript/releases/Read original →
Tue, May 123 dispatches
  • `Claude Platform` on `AWS` Reaches General Availability

    60radar
    Claude PlatformAI platform — full Claude API via AWS billing

    AWS-native auth, billing, and commitment drawdown now cover the full Claude API. Useful if your stack already sits on AWS; otherwise Bedrock still matters more for strict residency needs.

    • The service exposes the full `Claude API` feature set through AWS accounts, including native authentication, billing, and commitment retirement.
    • Claude Managed Agents, code execution, web search, web fetch, Files API, MCP connector, prompt caching, citations, and batch processing are included from day one.
    • Anthropic runs the service and says new features ship the same day as the native Claude API, reducing lag risk versus marketplace-style integrations.
    • Amazon Bedrock remains the better path when data must stay under AWS-only processing or regional residency rules are non-negotiable.
    Source: www.reddit.com/r/ClaudeAI/comments/1ta7p4n/the_claude_plRead original →
  • `anthropic-sdk-python` `v0.101.0` adds AWS Claude Platform client

    50radar
    anthropic-sdk-pythonPython SDK — adds direct AWS path for Claude access

    Python apps can now hit Claude Platform on AWS without custom glue. Useful if you already sit on AWS auth and networking, but not a must-upgrade unless that path matters now.

    • The only product change is an AWS client addition for Claude Platform on AWS; this is a distribution/integration update, not a new model release.
    • AWS-native auth, networking, and account boundaries get simpler in Python stacks using anthropic-sdk-python, which cuts adapter code.
    • Bug fix scope is tiny: one missing f-string prefix in a file type error message, so stability impact looks small.
    • Example updates moved scripts toward claude-sonnet-4-5-20250929 and uv, hinting at the maintainer's current Python tooling baseline.
    Source: github.com/anthropics/anthropic-sdk-python/releases/tag/Read original →
  • `anthropic-sdk-typescript` `aws-sdk` `v0.3.0` adds an AWS client

    50radar
    anthropic-sdk-typescriptTypeScript SDK — adds an AWS client for Claude

    AWS became a first-class target in the TypeScript SDK, so calling Claude from AWS setups gets simpler. Useful if your stack already sits on AWS; otherwise the impact is limited.

    • Release date is 2026-05-11 with a single shipped feature: an AWS client for Claude on AWS.
    • The changelog span is aws-sdk-v0.2.5...aws-sdk-v0.3.0, so this is a focused integration update, not a broad SDK overhaul.
    • Teams already anchored on Lambda, ECS, or AWS credentials can reduce glue code; non-AWS stacks can mostly ignore it for now.
    Source: github.com/anthropics/anthropic-sdk-typescript/releases/Read original →
Sat, May 91 dispatches
  • `Grok Code Fast 1` Deprecation Hits All `GitHub Copilot` Surfaces on May 15

    60radar
    Grok Code Fast 1Code model — fast-response option used across Copilot

    One more fallback disappears inside GitHub Copilot next week. If your prompts or completion quality depended on this model, re-test now; the switch is close and broad enough to break assumptions.

    • Grok Code Fast 1 is scheduled for deprecation on May 15, leaving little buffer for teams that pinned workflow expectations to it.
    • The change spans Copilot Chat, inline edits, ask mode, agent mode, and code completion, so impact is not limited to one entry point.
    • No replacement or migration detail is included here, which means quality, latency, and behavior shifts need manual verification.
    • If GitHub Copilot is part of a production coding loop, this is a prompt-regression check rather than a news item to ignore.
    Source: github.blog/changelog/2026-05-08-upcoming-deprecation-ofRead original →
Fri, May 81 dispatches
  • `GPT-4.1` in GitHub Copilot will be deprecated on `2026-06-01`

    60radar

    A fixed sunset date is now on the calendar across chat, inline edits, agent mode, and completions. If GPT-4.1 was your default in Copilot, swap workflows early because breakage risk starts the moment the model disappears.

    • GitHub set a `2026-06-01` deprecation date, so this is not a soft warning but a scheduled removal.
    • The change spans Copilot Chat, inline edits, ask, agent mode, and code completions, which means the impact hits nearly every Copilot surface.
    • GitHub is already pointing users to a suggested alternative, so prompt behavior and output quality should be rechecked before the cutoff.
    • Teams with saved Copilot habits around GPT-4.1 should update docs and presets now; waiting until June invites avoidable workflow churn.
    Source: github.blog/changelog/2026-05-07-upcoming-deprecation-ofRead original →
Thu, May 71 dispatches
  • #0001Models & APIOpenAI

    `OpenAI API` adds new realtime voice models for reasoning, translation, and transcription

    70radar

    Voice input now spans reasoning, translation, and transcription in one realtime API flow. This is immediately useful for shipping tighter voice UX, though pricing and latency still decide adoption.

    • The new voice models handle reasoning, translation, and transcription on spoken input, reducing multi-step pipeline glue.
    • Realtime support points to more natural turn-taking and lower-friction voice interfaces for assistants, support, and language tools.
    • The practical win is API-level consolidation: fewer separate speech components to orchestrate inside OpenAI API voice stacks.
    Source: openai.com/index/advancing-voice-intelligence-with-new-mRead original →