Telexed

telexed ~ home★4 and up · hourly · UTC+09LIVE

TELEXED// solo-operator signal radar · Issue 412

AI news through a solo-operator lens — only what changes your day5 of 412

FILTER[All][Agents & tools][Models & API][Generative media][Infra & SaaS][ASO & growth][Indie business][Idea signals][Other][★6+ high-signal]

r/LocalLLaMA ✕clear filters

Mon, May 181 dispatches

#0412
#0412Agents & tools r/LocalLLaMAyesterday
`SmallCode` hits 87/100 coding-agent tasks with an active 4B model
50radar
SmallCodeLocal coding agent — compound tools for small models
Reliability comes from the harness, not raw model size. The benchmark is self-reported, but the agent patterns are immediately reusable for local-first coding tools.
- Compound tools collapse search-read-edit-verify into one call, cutting the multi-step drift that breaks small models after 3+ tool calls.
- The fix loop runs compile/lint immediately after edits and feeds errors back, so the model only needs to repair concrete failures.
- On repeated failure, tasks shrink from broad file edits to line-level fixes; that is a practical recipe for weaker local models.
- Cloud escalation is scoped to the stuck task when an OpenAI or Claude key exists, keeping most work local without hard failure.
Source: www.reddit.com/r/LocalLLaMA/comments/1tgecrq/i_built_a_cRead original →
FIG-4121:1
50radar
FIG-4121:1

Sun, May 171 dispatches

#0411
#0411Other r/LocalLLaMA2 days ago
`llama.cpp` fork enables quantized KV cache with tensor split
50radar
llama.cppLocal LLM inference engine — supports GGUF and CUDA backends
Tensor parallelism becomes usable with quantized KV cache on dual GPUs. Still a fork with MoE caveats, so it is a test-only local inference tweak.
- Benchmarked Qwen3.5 27B Q4_K_M at 30.05 tok/s with -sm tensor vs 21.22 tok/s without it for generation.
- The command uses -ctk q8_0 -ctv q8_0, removing the old tensor-split tradeoff of falling back to non-quantized KV cache.
- Author reports real use rising from about 25 tok/s to 40 tok/s on 3060 12GB + 4070 Super 12GB.
- MoE models currently break with -sm tensor; dense models like Qwen 27B/9B are the safer test target.
Source: www.reddit.com/r/LocalLLaMA/comments/1tflngz/dual_gpu_llRead original →
50radar
PHOTO
FIG-4111:1

Sat, May 161 dispatches

#0410
#0410Agents & tools r/LocalLLaMA3 days ago
`Qwen3.6-35B-A3B` reaches **24.6%** on `Terminal-Bench 2.0`
50radar
Qwen3.6Open LLM model — listed on a terminal-agent benchmark
A smaller open model stack beat several larger agent setups on a hard terminal benchmark. Worth testing for local coding-agent loops, but still benchmark-first evidence.
- little-coder x Qwen3.6-35B-A3B scored 24.6% ±3.2, above Gemini 2.5 Pro on Gemini CLI at 19.6%.
- It also edged Qwen3-Coder-480B on Terminus 2 at 23.9%, showing scaffold choice can outweigh raw model scale.
- Qwen3.5-9B reached 9.2%; sub-10B local models now have measurable, nonzero performance on hard agentic tasks.
- This is still a leaderboard signal, not production proof. Try it on repo-specific tasks before replacing API-backed agents.
Source: www.reddit.com/r/LocalLLaMA/comments/1temio0/qwen3635ba3Read original →
50radar
PHOTO
FIG-4101:1

Fri, May 151 dispatches

#0409
#0409Other r/LocalLLaMA5 days ago
Self-Training With Verifiable Rewards Pushes `Qwen 2.5` 7B to **112/164** on HumanEval
50radar
A self-generated code-and-tests loop produced a large jump without human-written training pairs. Cheap enough to replicate, but still a one-off experiment rather than a product-ready recipe.
- The loop is simple: generate problems, sample multiple solutions, keep (failed attempt, fixed attempt) pairs, and let a Python interpreter score them.
- After fixing a grading bug, Qwen 2.5 7B moved from 25 to 112/164 on HumanEval; that is a big enough jump to treat as a real benchmark signal.
- A Qwen 2.5 14B run used 100 mined pairs, took 95 minutes on an H100, and cost $3.50; the barrier here is much lower than typical RL folklore.
- Control training on fake pairs gave 25/164, identical to base, which suggests the lift came from correction data rather than format imitation.
Source: www.reddit.com/r/LocalLLaMA/comments/1tde3m1/i_let_a_smaRead original →
50radar
PHOTO
FIG-4091:1

Wed, May 61 dispatches

#0408
#0408Agents & tools r/LocalLLaMA2 weeks ago
`llama.cpp` MTP makes `Qwen 3.6 27B` far more usable for local coding agents
50radar
llama.cppInference engine — lightweight local LLM serving
A custom llama.cpp build stacks MTP, turbo4 KV cache, and 262K context on 48GB Macs. Still a manual setup, but local agentic coding just moved from hobbyist tweak to viable option.
- --spec-type mtp --spec-draft-n-max 5 delivered 2.5x faster generation, reaching 28 tok/s on an M2 Max 96GB.
- turbo4 KV cache cuts KV memory to roughly one quarter, which is the real unlock for long-context local use.
- A 262K context window reportedly fits on 48GB Apple Silicon with Q5_K_M plus turbo4, making repo-scale sessions more realistic.
- The package also ships fixed chat templates and llama-server OpenAI/Anthropic-compatible endpoints, so existing agent stacks need less glue code.
Source: www.reddit.com/r/LocalLLaMA/comments/1t57xuu/25x_faster_Read original →
50radar
PHOTO
FIG-4081:1