telexed ~ c / 2bccd03b-09cradar:50 · agent_toolLIVE
← back
NO.
#2bccd03b
Topic
AGENTS & TOOLS
Source
r/LocalLLaMA
Published
2026-05-06 09:35:42
Importance
★ 5/10 — radar 50

`llama.cpp` MTP makes `Qwen 3.6 27B` far more usable for local coding agents

A custom llama.cpp build stacks MTP, turbo4 KV cache, and 262K context on 48GB Macs. Still a manual setup, but local agentic coding just moved from hobbyist tweak to viable option.

[ KEY POINTS ]
  1. --spec-type mtp --spec-draft-n-max 5 delivered 2.5x faster generation, reaching 28 tok/s on an M2 Max 96GB.
  2. turbo4 KV cache cuts KV memory to roughly one quarter, which is the real unlock for long-context local use.
  3. A 262K context window reportedly fits on 48GB Apple Silicon with Q5_K_M plus turbo4, making repo-scale sessions more realistic.
  4. The package also ships fixed chat templates and llama-server OpenAI/Anthropic-compatible endpoints, so existing agent stacks need less glue code.
Originalwww.reddit.com/r/LocalLLaMA/comments/1t57xuu/25x_faster_inference_with_qwen_36_27b_using_mtp/Read original →

// related