Telexed

telexed ~ home★4 and up · hourly · UTC+09LIVE

TELEXED// solo-operator signal radar · Issue 412

AI news through a solo-operator lens — only what changes your day2 of 412

FILTER[All][Agents & tools][Models & API][Generative media][Infra & SaaS][ASO & growth][Indie business][Idea signals][Other][★6+ high-signal]

r/LocalLLaMA ✕clear filters

Sun, May 171 dispatches

#0412
#0412Other r/LocalLLaMA2 days ago
`llama.cpp` fork enables quantized KV cache with tensor split
50radar
llama.cppLocal LLM inference engine — supports GGUF and CUDA backends
Tensor parallelism becomes usable with quantized KV cache on dual GPUs. Still a fork with MoE caveats, so it is a test-only local inference tweak.
- Benchmarked Qwen3.5 27B Q4_K_M at 30.05 tok/s with -sm tensor vs 21.22 tok/s without it for generation.
- The command uses -ctk q8_0 -ctv q8_0, removing the old tensor-split tradeoff of falling back to non-quantized KV cache.
- Author reports real use rising from about 25 tok/s to 40 tok/s on 3060 12GB + 4070 Super 12GB.
- MoE models currently break with -sm tensor; dense models like Qwen 27B/9B are the safer test target.
Source: www.reddit.com/r/LocalLLaMA/comments/1tflngz/dual_gpu_llRead original →
50radar
PHOTO
FIG-4121:1

Fri, May 151 dispatches

#0411
#0411Other r/LocalLLaMA5 days ago
Self-Training With Verifiable Rewards Pushes `Qwen 2.5` 7B to **112/164** on HumanEval
50radar
A self-generated code-and-tests loop produced a large jump without human-written training pairs. Cheap enough to replicate, but still a one-off experiment rather than a product-ready recipe.
- The loop is simple: generate problems, sample multiple solutions, keep (failed attempt, fixed attempt) pairs, and let a Python interpreter score them.
- After fixing a grading bug, Qwen 2.5 7B moved from 25 to 112/164 on HumanEval; that is a big enough jump to treat as a real benchmark signal.
- A Qwen 2.5 14B run used 100 mined pairs, took 95 minutes on an H100, and cost $3.50; the barrier here is much lower than typical RL folklore.
- Control training on fake pairs gave 25/164, identical to base, which suggests the lift came from correction data rather than format imitation.
Source: www.reddit.com/r/LocalLLaMA/comments/1tde3m1/i_let_a_smaRead original →
50radar
PHOTO
FIG-4111:1