telexed ~ cat / agent_tool★4 and up · hourly · UTC+09LIVE
All Agents & tools

Agents & tools

50 items
Today11 dispatches
  • #0050Agents & toolsGeekNews

    GitHub confirms 3,800 repositories compromised via malicious `VS Code` extension

    60radar

    A single developer workstation became the entry point. VS Code extension trust is now part of supply-chain security, so extension audits are worth doing now.

    • About 3,800 internal repositories were affected after one employee installed a trojanized VS Code extension.
    • GitHub’s current assessment limits exposure to internal repositories, but compromised developer endpoints can still leak secrets and code context.
    • The extension was removed from VS Code Marketplace, infected endpoints were isolated, and incident response started immediately.
    • Practical takeaway: review installed IDE extensions, publisher names, permissions, and disable unused tools before they become build-chain risk.
    Source: news.hada.io/topic?id=29731Read original →
  • `Claude Code` `v2.1.146` tightens code review and background sessions

    50radar
    Claude CodeTerminal coding agent — automates code edits with Claude

    Small release, but it removes several annoying agent-run failures. Windows, MCP pagination, and multi-agent env handling all get more reliable.

    • /simplify is now /code-review with optional effort levels like high, making review intent clearer in repeatable workflows.
    • MCP resources/list, resources/templates/list, and prompts/list no longer drop results after page 1. Tooling backed by large MCP servers becomes safer.
    • Windows fixes cover pwsh launch failures, terminal strobing, NTFS junction cleanup, and GNOME paste behavior. Cross-platform CLI friction drops.
    • CLAUDE_CODE_SUBAGENT_MODEL now reaches child processes in multi-agent sessions. Model routing gets less brittle for delegated coding runs.
    • Auto-updater retries transient network failures, and large diff rendering is faster. Not flashy, but daily-use reliability improved.
    Source: github.com/anthropics/claude-code/releases/tag/v2.1.146Read original →
  • #0048Agents & toolsGeekNews

    Google Cloud revamps agent development with `Antigravity 2.0`

    70radar
    AntigravityAgent dev tool — links local prototypes to cloud execution

    Google is packaging local prototyping and cloud deployment into one agent stack. If Managed Agents API removes hosting glue, it is worth tracking now.

    • Antigravity 2.0 and Managed Agents API are framed as an integrated dev kit, not separate demos.
    • The flow targets local prototyping first, then managed cloud execution. Less custom orchestration if the API is usable.
    • The available text is short, so pricing, lock-in, and runtime limits remain unknown. Treat it as watchlist, not migration trigger.
    Source: news.hada.io/topic?id=29718Read original →
  • #0047Agents & toolsr/ClaudeAI

    Rules for running phone-first vibe coding with `Claude Code`

    50radar
    Claude CodeAI coding agent — automates terminal-based code edits

    The useful part is the operating system: plan review, scoped chunks, commits, tests, and backups. Treat agents like junior implementers with guardrails, not magic.

    • Plan mode is the control point. Bad decisions compound, so unclear sections should be challenged before code changes begin.
    • If a plan cannot fit in your head, shrink the job. Smaller chunks reduce review burden and make rollback cleaner.
    • After each completed plan, commit with git. It creates a code rollback point, but does not cover database state.
    • Test cases should be readable in the plan: positives, negatives, missing inputs, and regressions before trusting generated code.
    • For complex changes, use subagents for plan critique, security review, and testing audit; DB work needs backups first.
    Source: www.reddit.com/r/ClaudeAI/comments/1tj2i90/im_a_softwareRead original →
  • `opencode` `v1.15.6` adds TUI diff review and shell mode

    70radar
    opencodeOpen-source coding agent CLI — TUI-first workflow

    Change review now happens inside the TUI, and run prompts can drop into shell mode. This is a practical upgrade for terminal-first agent workflows.

    • The TUI adds a diff viewer with auto-focus on the first file and collapsed single-child directories, cutting review friction before accepting edits.
    • Run now gets shell mode and replaces subagent tabs with an on-demand picker, making agent sessions less crowded during longer tasks.
    • Plugin failures are better isolated: file load errors and missing tool args no longer break the rest of plugin loading.
    • v2 HTTP API now exposes structured public error schemas and preserves endpoint error responses in the OpenAPI spec.
    Source: github.com/anomalyco/opencode/releases/tag/v1.15.6Read original →
  • `Antigravity 2.0` upgrade breaks IDE workflow for existing users

    60radar
    AntigravityCoding agent IDE — Google’s agent-first dev tool

    A forced product split turned one workflow into IDE, agent-only app, and CLI. Co-install bugs, session hijacking, blank marketplaces, and faster credit burn make this upgrade risky.

    • The old app was split into Antigravity IDE, agent-first Antigravity, and Antigravity CLI; users were pushed into the agent-only path instead of the matching IDE upgrade.
    • Antigravity 2.0 and Antigravity IDE 2.0 reportedly cannot coexist, creating a packaging failure that blocks normal migration.
    • Session hijacking prevents the IDE from opening after install, while Gemini support gives 1.x-era fixes and can burn credits before solving anything.
    • Marketplace loading can fail after install because requests trigger rate limits, leaving extensions blank or throwing unknown errors.
    Source: discuss.ai.google.dev/t/my-antigravity-is-broken-the-2-0Read original →
  • #0044Agents & toolsGeekNews

    `Gemini CLI` will stop working on June 18, 2026

    80radar
    Gemini CLITerminal AI CLI — runs Gemini from the command line

    Google is folding terminal AI workflows into Antigravity CLI. A popular CLI with 100k+ GitHub stars now has a hard migration deadline, so scripts and habits need cleanup soon.

    • Gemini CLI grew to millions of users, 100k+ GitHub stars, and 6,000+ merged PRs; this is not a small side-tool shutdown.
    • Google is consolidating capability into Antigravity CLI, pointing its agent tooling toward multi-agent workflows rather than a standalone Gemini terminal client.
    • The deadline is June 18, 2026. Any local aliases, CI helpers, docs, or onboarding snippets using Gemini CLI should be replaced before then.
    Source: news.hada.io/topic?id=29711Read original →
  • #0043Agents & toolsGeekNews

    Better generated branch names with `jj`

    40radar
    jjGit-compatible VCS — change-based workflow with anonymous branches

    Default push branch names are change-ID centric and awkward in CLI flows. A naming tweak can make Git interop cleaner; useful if jj is already in your workflow.

    • jj encourages anonymous branches, but pushing to a Git repo still needs a bookmark, effectively a Git branch name.
    • The default jj git push --change xyz creates names like push-xyz; machine-friendly, human-hostile in day-to-day CLI work.
    • Better generated names reduce friction around PRs, remote branches, and cleanup. Low impact unless your Git workflow already runs through jj.
    Source: news.hada.io/topic?id=29710Read original →
  • `Google Antigravity` shifts toward weekly quotas, hurting long coding sessions

    60radar
    Google AntigravityAI coding agent — developer tool powered by Gemini

    Short-reset Flash access is gone, pushing heavy use into 7-day quotas. The tool now fits burst work better than daily agentic coding.

    • Gemini 3.0 Flash with roughly 5-hour resets was the practical daily driver; its removal breaks predictable iteration loops.
    • Paid Ultra users are also hitting tighter limits, so this is not just a free-tier downgrade.
    • Weekly cooldowns turn debugging, refactoring, review, and test loops into prompt budgeting. Keep a fallback agent ready.
    Source: discuss.ai.google.dev/t/google-antigravity-has-come-to-aRead original →
  • `Gemini Spark`, Google’s hosted agent tied to Workspace apps

    60radar
    Gemini SparkHosted AI agent — native Google app connections

    Google is packaging app-connected agents around Workspace, but much is still coming soon. Security and credential handling decide whether it becomes useful or risky.

    • Gemini Spark connects natively to Gmail, Calendar, Drive, Docs, Sheets, Slides, YouTube, and Maps. That makes it closer to a work agent than a chat UI.
    • FAQ says it runs on Gemini 3.5 Flash and Antigravity. The Antigravity stack spans a desktop app, CLI, SDK, and VS Code fork.
    • Enterprise notes mention fresh isolated ephemeral VMs, Agent Gateway DLP, and encrypted credentials. That is the right threat model, not proof it is solved.
    • Since the product is not broadly testable yet, treat it as a roadmap signal. Do not build critical automations around it until GA behavior is clear.
    Source: simonwillison.net/2026/May/20/google-io/#atom-everythingRead original →
  • `GitHub Copilot` in VS Code gets task-based auto model routing

    70radar

    Model choice moves from manual picking to routing by task, utilization, and health. Useful for lower-friction coding, but less control over exact model behavior.

    • GitHub Copilot now selects a model using task fit, utilization, and model health metrics, aiming for reliable and token-efficient runs.
    • The change matters most in VS Code, where model switching interrupts small coding loops; default auto mode should reduce that friction.
    • Tradeoff: less explicit model control. Keep manual model selection for debugging, refactors, or prompts where output variance matters.
    Source: github.blog/changelog/2026-05-20-auto-model-selection-noRead original →
Yesterday20 dispatches
  • `GitHub Copilot Chat` Adds Semantic Issue Search

    80radar

    Natural-language issue triage now works inside web chat. It reduces manual filtering across noisy repos and is worth trying for backlog cleanup.

    • Queries can find, group, and analyze issues with a semantic issue index, so exact-label hygiene matters less.
    • The feature is available in GitHub Copilot Chat on the web, keeping triage inside the existing GitHub workflow.
    • Best fit: duplicate detection, bug clustering, and release-scope checks before planning a small sprint.
    Source: github.blog/changelog/2026-05-20-semantic-issue-search-iRead original →
  • #0038Agents & toolsGeekNews

    `Codex Relay` Adds Mobile Terminal, Browser, Git, File Viewer, and Markdown for Codex

    50radar
    Codex RelayMobile Codex companion — adds terminal, Git, and file viewer

    A free OSS companion fills gaps around mobile Codex use. The overlap with official remote access caps urgency, but the extra tools make it worth a quick trial.

    • Includes Terminal, Browser, Git, File Viewer, and Markdown in one mobile-focused Codex companion.
    • Official Codex Remote already covers the core use case, so this is a convenience layer rather than a must-migrate tool.
    • Open-source and free lowers trial cost; check auth, repo access, and maintenance before using it on private work.
    Source: news.hada.io/topic?id=29706Read original →
  • `Antigravity` users push back on hidden compute quotas

    60radar
    AntigravityAI coding IDE — agent workflow powered by Google models

    Paid usage turned unpredictable after quota accounting moved from requests to hidden compute. A cheaper plan can now fail mid-workflow; budget risk matters more than model choice.

    • A $20 Pro user reports 2-3 weeks of HTTP 429 lockouts despite visible quota remaining; reliability became the real blocker.
    • The May 19 change replaced request limits with hidden compute-used accounting, making agent background scans and micro-queries harder to budget.
    • Gemini 3.5 Flash is described as more verbose and weaker at coding, burning quota through long explanations instead of useful edits.
    • Upgrade pressure jumps to $100-$200 Ultra or a reported 5-day ban after quota exhaustion; keep fallback IDE and agent paths ready.
    Source: discuss.ai.google.dev/t/how-antigravity-became-cursor-2-Read original →
  • `Antigravity IDE` 2.0.1 macOS report: fatal DI crash disables agents and marketplace

    60radar
    Antigravity IDEAI coding IDE — built-in Google agent workflows

    A clean install can still hit aae depends on UNKNOWN service agentSessions. Treat the 2.0 update as risky on macOS until a fix lands.

    • Environment is specific: Antigravity IDE 2.0.1, VSCode OSS 1.107.0, macOS Darwin arm64 25.5.0 on Apple M4 Max.
    • Core failure is dependency injection: aae cannot resolve agentSessions, so the AI Agent Manager never starts.
    • Marketplace also breaks with open-vsx.org extension manifest fetches returning 429, leaving plugins unavailable.
    • Cache wipes across ~/Library/Application Support/Antigravity IDE, ~/.antigravity, and cache folders did not recover it.
    Source: discuss.ai.google.dev/t/bug-fatal-di-crash-on-clean-instRead original →
  • `Antigravity 2.0` Windows update can hijack the IDE launcher

    60radar
    AntigravityAI coding IDE — Google-backed agentic dev tool

    Default install places the new app.asar in the old IDE folder. Rename it to recover the IDE, then copy config folders back; useful if your setup vanished after updating.

    • Electron loads resources by directory, so resources\app.asar from 2.0 takes over the original IDE executable when both land in the same install path.
    • Rollback is simple: rename app.asar to app.asar.bak under %LOCALAPPDATA%\Programs\Antigravity\resources; restore the name to switch back to 2.0.
    • Settings split because product names differ: Antigravity keeps old config, while restored Antigravity IDE creates empty Roaming and extension folders.
    • Recover by copying Roaming\Antigravity into Roaming\Antigravity IDE, and .antigravity into .antigravity-ide; use mklink /J if extension paths exceed Windows limits.
    Source: discuss.ai.google.dev/t/fix-for-antigravity-2-0-hijackinRead original →
  • `Antigravity` backlash: editor removed, agent command center takes over

    60radar
    AntigravityAI coding IDE — Google’s agent-first dev tool

    The update shifts from a full IDE into an agent hub. Losing file-first editing weakens it as a daily driver; worth watching before switching workflows.

    • The core complaint is concrete: files, editor, terminal, and change tracking no longer feel like one controlled workspace.
    • A CLI is framed as a poor replacement for a full IDE. For app debugging, terminal-first flow adds friction.
    • Cursor and Windsurf were named as competitors that benefit if Antigravity drops its integrated editor.
    • Only 5 posts from 3 participants, so this is an early friction signal, not broad market proof.
    Source: discuss.ai.google.dev/t/you-did-not-upgrade-antigravity-Read original →
  • GitHub Internal Repos Accessed After Employee Device Compromise

    40radar

    A poisoned VS Code extension became the entry point. Treat editor extensions as supply-chain risk, not convenienceware.

    • Attack path: a compromised employee endpoint via malicious VS Code extension, followed by access to internal repositories.
    • GitHub removed the malicious extension version and isolated the endpoint. Extension version pinning and review matter for dev machines.
    • No concrete customer-impact detail is available in the provided text. Actionable takeaway stays limited to workstation hardening.
    Source: news.hada.io/topic?id=29703Read original →
  • `Cline CLI` `v3.0.9` speeds up plugin startup and config toggles

    50radar
    ClineCoding agent CLI — plugin-based automation support

    Plugin-heavy CLI sessions start faster. Optimistic TUI updates and cached tool descriptors reduce friction, worth updating if Cline CLI is in daily use.

    • Sandboxed plugins now load concurrently, with tool descriptors cached per plugin, provider, and model. Startup latency should drop most in plugin-heavy setups.
    • Plugin and tool config toggles update the TUI optimistically and persist without full config reloads. Less waiting while switching tools on and off.
    • The @ mention file picker restores fuzzy ranking, so relevant files surface first again. Small fix, but it cuts prompt setup friction.
    • Cancelled tasks no longer tear down the interactive session, and abort cleanup failures no longer crash the runtime host.
    Source: github.com/cline/cline/releases/tag/cli-v3.0.9Read original →
  • Claude Code workflow bottleneck: automate `Connect`, `Encode`, `Teach`, `Parallelize`

    50radar

    The bottleneck shifts from typing code to spotting repeated friction. A weekly friction log can turn small annoyances into scripts, skills, MCP connectors, or parallel agent runs.

    • Connect covers copy-paste between tools; the fix is giving the agent source access through an MCP server or CLI.
    • Encode targets repeated step sequences. Turn recurring deploy, debug, or cleanup flows into scripts or reusable skills.
    • Teach means repeated context is leaking into prompts. Move durable instructions into CLAUDE.md or a skill.
    • Parallelize is the strongest claim: watching one agent run wastes attention, so multiple sessions beat one supervised session.
    Source: www.reddit.com/r/ClaudeAI/comments/1ti8cwr/after_a_year_Read original →
  • `Cursor Composer 2.5` becomes Cursor's most-selected model, with **10x** usage bonus

    80radar
    CursorAI coding IDE — agentic code writing and edits

    The in-house coding model is overtaking third-party defaults inside the IDE. Test it today while the usage cap is temporarily loose.

    • CEO Michael Truell said Composer 2.5 became the most-selected model in Cursor; adoption moved fast right after launch.
    • All users get 10x usage for one day, making it a low-cost window for real project testing instead of toy prompts.
    • This is pressure on Claude and OpenAI workflows: IDE-native models win when latency, quota, and UX beat raw benchmark trust.
    Source: news.hada.io/topic?id=29691Read original →
  • `Mirage`, a unified virtual filesystem for AI agents

    70radar
    MirageVirtual filesystem for AI agents — mounts SaaS as one tree

    Different SaaS backends become one filesystem tree. Agents can use Unix tools instead of learning every SDK or MCP, so cross-service automation gets simpler.

    • Mounts S3, Google Drive, Slack, Gmail, and Redis into one filesystem tree — fewer integration surfaces for agents.
    • Agents can work through Unix-style bash tools, reducing the need to teach each agent service-specific SDKs or MCP interfaces.
    • Cross-service pipelines become file operations. Moving data between storage, chat, mail, and cache can be scripted more cheaply.
    Source: news.hada.io/topic?id=29681Read original →
  • `Cursor Automations` Now Works Inside the Agents Window

    80radar
    CursorAI code editor — built-in agentic development workflows

    Scheduled agent work is moving closer to the main coding surface. Multi-repo and repo-less runs make Cursor more useful for maintenance, audits, and non-code ops.

    • Automations now appears in the Agents Window, reducing the gap between scheduled jobs and active agent work.
    • A single automation can attach multiple repos, useful for cross-repo refactors, dependency checks, and shared package updates.
    • Repo-less automations expand the use case beyond codebases: reminders, issue triage, release notes, or research queues can run without a project attached.
    Source: cursor.com/changelog/05-20-26Read original →
  • `Zed` Adds Terminal Threads for Coding Agents

    70radar
    ZedCode editor — fast collaborative AI workflow focus

    Terminal agents can now live as sidebar threads instead of loose shell sessions. Useful if you run Claude Code or Amp beside code all day.

    • Claude Code, Amp, and other terminal agents can run as threads in Zed's sidebar; agent work becomes easier to revisit and separate.
    • The feature turns terminal-agent sessions into IDE-native context, reducing tab/window juggling during multi-step refactors or bug hunts.
    • This is a workflow upgrade, not a new model capability. Worth trying if Zed is already your editor or agent cockpit.
    Source: zed.dev/blog/terminal-threadsRead original →
  • `GitHub Copilot` Code Review Adds `Fix with Copilot` Dialog

    60radar

    Review suggestions now open with more control before applying changes. It reduces PR cleanup friction, but the impact stays tactical unless the agent handles multi-file edits well.

    • Implement suggestion is now Fix with Copilot; the naming pushes code review fixes into the cloud-agent workflow.
    • A new UI dialog adds control over how suggestions are applied, useful when review comments are too risky for one-click patches.
    • Best fit is small PR cleanup: lint fixes, narrow refactors, and reviewer nits. Architectural feedback still needs manual judgment.
    Source: github.blog/changelog/2026-05-19-easily-apply-copilot-coRead original →
  • `Claude Code` `v2.1.145` adds JSON session listing and richer agent telemetry

    70radar
    Claude CodeCoding agent CLI — automates code work with Claude in terminal

    Live sessions are now scriptable via claude agents --json. Better OTEL parenting, PR-aware status lines, and safer Bash approval make multi-agent workflows easier to monitor.

    • claude agents --json exposes live sessions for tmux restore, status bars, and custom session pickers — useful for long-running local agent setups.
    • agent_id and parent_agent_id landed in OTEL spans, with fixed trace parenting so background subagents nest under the dispatching Agent span.
    • Status-line JSON now includes detected GitHub repo and PR data, tightening the loop between CLI work and pull-request state.
    • A Bash permission bypass for bare non-allowlisted env assignments was fixed — upgrade promptly if command approval boundaries matter.
    • Plugin discovery now shows commands, agents, skills, hooks, and MCP/LSP servers before install, reducing blind marketplace installs.
    Source: github.com/anthropics/claude-code/releases/tag/v2.1.145Read original →
  • `Gemini Code Assist` sunset points to `Antigravity CLI` migration

    60radar
    Antigravity CLICoding agent CLI — replacement path for Code Assist

    The June 18 cutoff turns PR review automation into a migration task. Broken docs raise execution risk, so audit hooks and CI usage before relying on it.

    • Docs put the cutoff at June 18. Any Gemini Code Assist PR-review flow needs a replacement path before then.
    • Antigravity CLI is positioned as the follow-up tool, shifting review automation from hosted assist to CLI-driven workflow.
    • Broken documentation links are a real adoption risk. Budget time for setup friction instead of treating this as a drop-in swap.
    Source: discuss.ai.google.dev/t/gemini-code-assist-replaced-withRead original →
  • `llm-gemini` `0.32a0` adds reasoning-token streaming

    40radar
    llm-geminiLLM CLI plugin — runs Gemini models from `llm`

    Gemini reasoning output can now stream through the llm CLI alpha path. Small alpha release, but useful when inspecting long reasoning traces live.

    • Requires llm>=0.32a0, so this is tied to the alpha CLI line rather than the stable release.
    • The new behavior streams reasoning tokens, reducing the blind wait while Gemini works through longer prompts.
    • Scope is narrow: no pricing, model, or workflow change. Worth testing only if llm is already in your CLI stack.
    Source: simonwillison.net/2026/May/19/llm-gemini/#atom-everythinRead original →
  • `Gemini 3.5 Flash` is now GA in `GitHub Copilot`

    80radar

    A faster, cheaper coding model option is landing in the IDE. Near-Pro quality at Flash-tier cost makes it worth testing for routine implementation loops.

    • GitHub Copilot is rolling out Google’s latest Flash-tier model as a generally available option, not a preview-only experiment.
    • Early testing claims near-Pro coding quality with Flash-tier speed and cost, useful for high-volume autocomplete and edit cycles.
    • The practical play is model routing: keep premium models for hard design calls, use Gemini 3.5 Flash for repetitive coding throughput.
    Source: github.blog/changelog/2026-05-19-gemini-3-5-flash-is-genRead original →
  • `Cline CLI` `v3.0.8` fixes plugin diagnostics, Bedrock setup, and token counts

    50radar
    ClineOpen-source coding agent — runs agent workflows in IDE and CLI

    This is a maintenance release, but the fixes hit real workflow costs. Cleaner broken-plugin diagnostics and accurate token accounting make local agent setups easier to trust.

    • Failed plugins now stay visible in the config UI with load/setup phase and error details, so broken definitions are faster to debug.
    • AgentRuntime.execute() now resets usage between calls, fixing inflated token counts from local runtime double-counting.
    • AWS Bedrock onboarding now detects region/profile correctly and exposes bearer-token plus extra Bedrock config fields.
    • Create Session Fork moved from Opt+F to Opt+R, restoring terminal word-right navigation.
    Source: github.com/cline/cline/releases/tag/cli-v3.0.8Read original →
  • `Cline` `v3.84.0` adds SAP AI Core hosted model support

    40radar
    ClineVS Code coding agent — MCP and multi-model support

    More hosted model options landed, but this is a narrow integration release. Useful only if your workflow already touches SAP AI Core.

    • SAP AI Core support expands hosted model choices inside Cline; direct value is limited outside SAP-backed environments.
    • The MCP Restart Server button is disabled when a server is toggled off, reducing accidental server actions in agent setups.
    • The startup flow drops the Cline Kanban launch modal and bundled demo media, making the VS Code extension open cleaner.
    Source: github.com/cline/cline/releases/tag/v3.84.0Read original →
Tue, May 1919 dispatches
  • #0019Agents & toolsGeekNews

    `Goal Setter`, an agent skill for writing safer `Codex` goals

    50radar
    Goal SetterCodex agent skill — interviews users to define done states

    Long-running agent work now needs sharper stop conditions. This skill turns vague requests into explicit done states before Goal burns time and tokens.

    • Goal Setter interviews the user before creating a goal, reducing drift in long Codex runs.
    • The core check is what exact state counts as done. Without that, Goal can waste tokens fast.
    • Useful for large refactors, test work, or migration tasks where the agent needs persistence but clear boundaries.
    Source: news.hada.io/topic?id=29661Read original →
  • Agent Shell Access Hit the `rm -rf /` Failure Mode

    40radar

    An agent tried rm -rf / while testing a shell-command block. The block worked, but sandboxing must come before shell access.

    • The whitelist blocked the harmful command, so damage was zero, aside from operational panic.
    • bubblewrap isolation came after the whitelist; that ordering is backward for any agent with shell execution.
    • Command allowlists help, but they are a second layer. Filesystem isolation and disposable workspaces should be default.
    Source: www.reddit.com/r/LocalLLaMA/comments/1thosnt/got_my_firsRead original →
  • `Nuxt MCP Toolkit` adds support for MCP apps

    70radar
    Nuxt MCP ToolkitMCP toolkit for Nuxt — Vue SFC-based tool UIs

    Agent tools can now return inline interactive HTML, not just text. Useful for richer tool flows inside Claude or ChatGPT.

    • Tools declared with defineMcpApp can render interactive HTML responses in MCP clients such as Claude and ChatGPT.
    • useMcpApp lets the UI read pre-hydrated data, trigger follow-up prompts, or call other tools from inside the response.
    • Vue SFCs are bundled into self-contained HTML at build time and served from the MCP endpoint, reducing custom UI plumbing.
    Source: vercel.com/changelog/nuxt-mcp-toolkit-mcp-appsRead original →
  • `Claude Managed Agents` Run on `Cloudflare`

    80radar
    Claude Managed AgentsCoding agent platform — isolated execution and custom tools

    Agent workflows get isolated, globally distributed execution without opening private backends too widely. Useful once coding agents move from local experiments to repeatable delivery pipelines.

    • Cloudflare provides a fast, isolated execution environment for autonomous code delivery, reducing the need to run agent workers on your own servers.
    • Access control over private backends is the practical hook. Agents can operate near production systems without broad credentials floating around.
    • Custom tools and runtimes are supported, so Claude Managed Agents can fit repo-specific deploy, test, and data workflows.
    Source: blog.cloudflare.com/claude-managed-agents/Read original →
  • `Forge` Pushes Local 8B Agent Reliability Near Frontier APIs

    70radar
    ForgeLLM guardrail runtime — improves local tool-call reliability

    Guardrails, not bigger weights, drive the jump. The useful takeaway is architectural: retries, recovery, and serving backend choice can matter more than model size.

    • Ministral 8B with Forge hit 99.3%, versus Claude Sonnet with guardrails at 100% across the reported eval setup.
    • Without retry nudges, scores dropped 24-49 points. Reliability work belongs in the agent runtime, not only in model selection.
    • Serving backend changed the same Mistral-Nemo 12B weights from 7% on llama-server native function calling to 83% on Llamafile prompt mode.
    • Error recovery scored 0% for every tested model without retry logic. Tool agents need explicit recovery paths before production use.
    Source: github.com/antoinezambelli/forgeRead original →
  • `Forge` pushes local LLM tool-calling reliability with guardrail retries

    70radar
    ForgeLLM tool-calling layer — guardrails for local models

    Guardrails, not model size, drive most of the gain. Useful if you want always-on agents without frontier API spend.

    • Ministral 8B reached 99.3% with Forge; Claude Sonnet with the same layer hit 100%.
    • Without guardrails, Claude Sonnet scored 87.2%, so orchestration beat raw model strength in this eval.
    • Retry nudges caused 24-49 point drops when removed; error recovery added about 10 points across tested models.
    • Backend choice changed results hard: the same Mistral-Nemo 12B weights scored 7% on llama-server vs 83% on Llamafile.
    Source: github.com/antoinezambelli/forgeRead original →
  • `Forge` raises local 8B agent task success from 53% to 99% with guardrails

    70radar
    ForgeLLM tool-calling guardrails — retries and recovery for local models

    Reliability came from orchestration, not a bigger model. Forge makes local tool-calling viable when cloud agent costs are the bottleneck.

    • Ministral 8B with Forge hit 99.3% across multi-step workflows; Claude Sonnet with the same guardrails reached 100%.
    • Without retry handling, error recovery scored 0% across every tested local and frontier model. The missing layer is architectural.
    • Backend choice changed results sharply: the same Mistral-Nemo 12B weights scored 7% on llama-server native calling and 83% on Llamafile prompt mode.
    • Ablation points to retry nudges and error recovery as the useful parts. Rescue parsing and context compaction stayed for rare production failures.
    Source: github.com/antoinezambelli/forgeRead original →
  • `Forge` brings local 8B agent workflows near frontier reliability

    70radar
    ForgeLLM guardrail layer — improves local tool-calling reliability

    Guardrails, not bigger weights, drive the result. Retry nudges and error recovery make local always-on agents cheaper to test now.

    • Ministral 8B reached 99.3% with Forge; Claude Sonnet with guardrails hit 100%, leaving less than 1 point between local and frontier.
    • Without guardrails, the same comparison flips: local 8B plus framework support beat unguarded Claude Sonnet at 87.2%.
    • Ablations put most value in retry nudges and error recovery. Disabling retry nudges caused 24-49 point drops.
    • Serving backend changed outcomes sharply: Mistral-Nemo 12B scored 7% on llama-server native function calling vs 83% on Llamafile prompt mode.
    Source: github.com/antoinezambelli/forgeRead original →
  • `Forge` adds reproducible guardrails for local LLM agents

    70radar
    ForgeLocal LLM reliability layer — retry and recovery guardrails

    Local tool-calling reliability is framed as a system problem, not a model-size problem. If the evals hold, always-on agents get much cheaper.

    • Ministral 8B reached 99.3% with guardrails; Claude Sonnet with the same layer hit 100%.
    • Without retry handling, error recovery scored 0% across local and frontier models. The missing piece is architecture.
    • Ablations put most lift on retry nudges and error recovery; context compaction helped less in the benchmark.
    • Serving backend changed Mistral-Nemo 12B from 7% to 83% accuracy, so deployment stack is part of model quality.
    Source: github.com/antoinezambelli/forgeRead original →
  • Anthropic acquires `Stainless`, the major MCP server generator

    80radar
    StainlessSDK generation platform — builds SDKs and MCP servers from OpenAPI

    The strongest OpenAPI-to-MCP pipeline is now closed to new users. Better standard templates are likely, but vendor concentration just became a real stack risk.

    • Stainless generated official SDKs for OpenAI, Google, Meta, Cloudflare, and Anthropic, then extended that compiler to MCP servers.
    • MCP reached about 97M monthly SDK downloads by Dec 2025 and roughly 10,000 production servers by early 2026.
    • New signups and new SDK/MCP generations stopped on Monday; existing customers keep generated code, but the pipeline is closed.
    • Cloudflare's MCP framework, Pulse MCP, and open-source generators now matter more as practical alternatives to Anthropic-owned tooling.
    Source: www.reddit.com/r/ClaudeAI/comments/1thkkrb/anthropic_jusRead original →
  • 100 Practical Rules for Building a Persistent Personal AI Agent

    50radar

    A six-week build distilled into operating rules for a real agent: constitution, identity, capability maps, and local automation. Useful as an agent design checklist, not a product update.

    • Start with a constitution, not just a system prompt. It gives the agent a basis for edge cases instead of brittle command-following.
    • Separate hard rules from behavioral guidelines. Mixing them makes the agent treat everything as either negotiable or frozen.
    • Keep a Capability Map and Component Map apart: what it can do vs. how it is wired. That keeps Claude Code setups maintainable after month three.
    • The cloud-to-local move added file access, git tracking, shell hooks, and scheduled headless tasks. Serious agents need tool surfaces, not chat only.
    Source: www.reddit.com/r/ClaudeAI/comments/1thi6nh/100_tips_tricRead original →
  • Using `Power Automate` Webhooks as an MCP Bridge for Microsoft 365

    50radar
    Power AutomateWorkflow automation SaaS — runs M365 connectors via webhooks

    Power Automate can turn existing M365 permissions into callable agent tools without Graph admin approval. Useful for personal ops automation, but webhook hygiene is the real risk.

    • Each M365 action becomes one Power Automate flow with an HTTP trigger, then a small FastMCP server exposes it as a Claude tool.
    • The setup covered 22 flows: email, calendar, OneDrive notes, Planner tasks, Excel rows, and Word templates.
    • Signed webhook URLs act like passwords. A duplicated URL already caused the wrong action to run, so config review matters more than code size.
    Source: www.reddit.com/r/ClaudeAI/comments/1thabze/i_gave_claudeRead original →
  • #0007Agents & toolsGeekNews

    Anthropic Acquires `Stainless` to Expand Agent Tooling

    50radar
    StainlessAPI tooling SaaS — Generates SDKs and MCP servers

    Agent value now depends on how many real systems it can reach. This is not an immediate product change, but it signals broader Claude tool integration ahead.

    • Stainless builds SDK and MCP server tooling, so the acquisition targets the connection layer between APIs and agents.
    • The move shifts focus from model answers to action-capable agents that can touch data, tools, and workflows.
    • No pricing, release date, or developer-facing feature is included yet. Treat it as a roadmap signal, not something to adopt today.
    Source: news.hada.io/topic?id=29647Read original →
  • #0006Agents & toolsGeekNews

    `Project Glasswing`: What `Mythos` Demonstrated

    60radar
    MythosSecurity agent — proves exploit chains automatically

    The bar moved from spotting suspicious code to proving a working exploit path. This is early, but it hints at security agents that can validate bugs before a human review.

    • Mythos Preview ran across 50+ Cloudflare repos, linking multiple primitives into exploit chains instead of flagging isolated bugs.
    • It wrote trigger code, compiled and executed temporary tests, then revised failed hypotheses. That closes the gap between static finding and proof.
    • The practical signal is security automation shifting toward reproducible evidence. Expect fewer raw alerts, more agent-generated repro cases.
    Source: news.hada.io/topic?id=29645Read original →
  • #0005Agents & toolsGeekNews

    Using Git `--author` to Block AI Bot Spam in GitHub Repos

    40radar

    AI-generated PR and issue noise can bury real maintainer discussion fast. A lightweight Git identity gate is a practical abuse filter for bounty issues.

    • An Archestra bounty issue reached 253 comments after AI-bot replies crowded out contributor discussion.
    • The reported failure mode was not just volume: meaningless comments, PRs, and aggressive replies raised maintainer cost.
    • git --author points to a cheap screening layer: filter suspicious commit identities before review time gets spent.
    Source: news.hada.io/topic?id=29642Read original →
  • `Claude Code` `v2.1.144` improves background sessions, MCP tools, and terminal stability

    80radar
    Claude CodeCoding agent CLI — automates code work with Claude in terminal

    Background agents are easier to resume and diagnose. The bigger win is fewer broken long sessions: MCP pagination, terminal corruption, startup hangs, and bad image files now fail less often.

    • /resume now lists sessions started with claude --bg or agent view, marked as bg; background work is easier to recover after context switching.
    • Background subagent completion notifications include elapsed time like 3h 2m 5s, useful for spotting expensive or slow automation runs.
    • /model now changes only the current session; press d in the picker to set defaults, reducing accidental model drift across sessions.
    • MCP tools/list pagination now returns more than the first page. Tool-heavy setups should stop losing capabilities silently.
    • Startup hangs when api.anthropic.com is unreachable now time out after 15s, not up to 75s; bad networks hurt less.
    Source: github.com/anthropics/claude-code/releases/tag/v2.1.144Read original →
  • #0003Agents & toolsGeekNews

    Using `Codex` `Goals` for Long-Running Tasks

    50radar
    CodexCoding agent — continues multi-turn work toward a goal

    Goals keeps multi-turn work moving toward a defined outcome. Useful for profiling, patches, benchmarks, flaky tests, and evidence-based audits.

    • Goals is a persistent objective for a Codex thread, so work can continue across multiple turns toward a defined result.
    • Best fit is work that breaks a single prompt: profiling, patching, benchmarking, flaky test reproduction, and audits.
    • The leverage comes from clear end conditions. Vague goals turn into extra turns without better output.
    Source: news.hada.io/topic?id=29639Read original →
  • `Claude Code` Workflow for Non-Technical PMs

    50radar
    Claude CodeCoding agent CLI — automates code edits and runs in terminal

    A no-code-to-agent path: start with Lovable, then move into multi-agent work in Claude Code. Useful as an adoption pattern, but thin without code, metrics, or failure cases.

    • The flow starts from Lovable-style builders and ends at multi-agent systems in Claude Code; good migration framing for prototype-to-automation work.
    • The target user is non-technical PMs, so the value is workflow scaffolding rather than deep engineering detail.
    • No numbers, benchmarks, or concrete output quality claims are given; treat it as a light tutorial signal, not a tool launch.
    Source: www.news.aakashg.com/p/claude-code-non-technical-pmsRead original →
  • GitHub adds one-click Action failure fixes with `Copilot cloud agent`

    70radar
    Copilot cloud agentCoding agent — automates GitHub tasks in the cloud

    Failed CI can now be handed to an agent from the Actions UI. Useful for paid teams, but availability is limited to Business and Enterprise.

    • A failed GitHub Actions job now shows a Fix with Copilot button for one-click agent handoff.
    • Access is limited to Copilot Business and Copilot Enterprise, so it is not a free GitHub workflow upgrade.
    • The biggest win is CI repair latency: failed tests can move straight into an agent patch loop without opening an IDE.
    Source: github.blog/changelog/2026-05-18-one-click-fixes-for-faiRead original →