`llama.cpp` MTP makes `Qwen 3.6 27B` far more usable for local coding agents

A custom llama.cpp build stacks MTP, turbo4 KV cache, and 262K context on 48GB Macs. Still a manual setup, but local agentic coding just moved from hobbyist tweak to viable option.

[ KEY POINTS ]

--spec-type mtp --spec-draft-n-max 5 delivered 2.5x faster generation, reaching 28 tok/s on an M2 Max 96GB.
turbo4 KV cache cuts KV memory to roughly one quarter, which is the real unlock for long-context local use.
A 262K context window reportedly fits on 48GB Apple Silicon with Q5_K_M plus turbo4, making repo-scale sessions more realistic.
The package also ships fixed chat templates and llama-server OpenAI/Anthropic-compatible endpoints, so existing agent stacks need less glue code.

Originalwww.reddit.com/r/LocalLLaMA/comments/1t57xuu/25x_faster_inference_with_qwen_36_27b_using_mtp/Read original →

// related

#0001
#0001Agents & tools GitHub Changelog24 hours ago
GitHub adds REST API auditing for `Copilot` cloud agent repo config
60radar
Repo-level agent settings can now be checked by API instead of manual UI review. Useful for keeping automation permissions visible before cloud agents touch production code.
- New endpoint: Get Copilot cloud agent configuration for a repository, currently in public preview.
- Best fit is policy drift checks across repos: scan whether agent access and configuration match your expected defaults.
- This is governance plumbing, not a coding-speed feature. Worth adopting if Copilot agents run on real repos.
Source: github.blog/changelog/2026-05-18-audit-repository-copiloRead original →
FIG-0011:1
60radar
FIG-0011:1
#0002
#0002Agents & tools GitHub Changelogyesterday
`Copilot Spaces API` is now generally available
70radar
Copilot SpacesGitHub Copilot feature — manages task-specific context spaces
Spaces can now be managed from your own apps via API. Useful for wiring repo context into internal tools or repeatable agent workflows.
- The API supports create, read, update, and delete for Spaces, so context setup no longer has to stay inside GitHub UI.
- Good fit for templates: bootstrap a project space per repo, customer, or feature branch and keep agent context consistent.
- This is more automation surface than end-user feature. Value depends on whether Copilot Spaces is already part of the coding workflow.
Source: github.blog/changelog/2026-05-18-copilot-spaces-api-now-Read original →
FIG-0021:1
70radar
FIG-0021:1
#0003
#0003Agents & tools r/ClaudeAIyesterday
11 Claude Habits That Compound Over Daily Use
50radar
The useful part is not prompt tricks, but persistent context: Projects, CLAUDE.md, styles, skills, and subagents. Worth turning into a default setup before long coding sessions.
- Put codebase context, style guides, and prior PRs into Projects once. Re-pasting the same background is pure context tax.
- A custom style like skeptical senior engineer changes review quality by forcing pushback instead of agreeable code comments.
- In Claude Code, CLAUDE.md carries more weight than session prompts. Around 80 lines of project context can remove repeated stack explanations.
- Use cheaper/faster models by task: Sonnet as default, Opus for architecture, Haiku for batch cleanup like tickets or emails.
- Subagents fit parallel chores: run tests, inspect files, or summarize docs while the main coding thread keeps moving.
Source: www.reddit.com/r/ClaudeAI/comments/1tgqnsl/11_claude_thiRead original →
50radar
PHOTO
FIG-0031:1

`llama.cpp` MTP makes `Qwen 3.6 27B` far more usable for local coding agents

// related

GitHub adds REST API auditing for `Copilot` cloud agent repo config

`Copilot Spaces API` is now generally available

11 Claude Habits That Compound Over Daily Use