`ik_llama.cpp` pushes `Qwen3.6 35B A3B` near 110 tok/s on 12GB VRAM

MTP plus CPU offload can make a local MoE model feel interactive on consumer hardware. Useful for private coding or batch jobs, but still a setup-specific benchmark.

[ KEY POINTS ]

Same IQ4_XS quant averaged 89.76 tok/s on regular llama.cpp; ik_llama.cpp samples reached roughly 105-110 tok/s.
Hardware was RTX 4070 Super 12GB, Ryzen 7 9700X, and 48GB DDR5. CPU offload quality matters as much as VRAM.
Benchmark used --ctx-size 131072, q8 KV cache, and draft-mtp; long-context local workflows remain memory-sensitive.
Treat it as a tuning lead, not a buying guide. Kernel, quant, and fork versions can swing results hard.

Originalwww.reddit.com/r/LocalLLaMA/comments/1tjh7az/110_toks_with_12gb_vram_on_qwen36_35b_a3b_and_ik/Read original →

// related

#0001
#0001Other GeekNews13 hours ago
Firefox 148 Starts Turning Off `asm.js` Optimization
40radar
Legacy asm.js still runs, but loses its fast path in Firefox. Old web games and compute-heavy demos should move to WebAssembly; new products can ignore this.
- From Firefox 148, SpiderMonkey disables asm.js optimization by default, with removal planned later.
- Compatibility remains because asm.js is a JavaScript subset; the breakage risk is performance, not execution.
- Existing asm.js assets should be migrated to WebAssembly. For new builds, do not target asm.js.
Source: news.hada.io/topic?id=29732Read original →
FIG-0011:1
40radar
FIG-0011:1
#0002
#0002Other GeekNews16 hours ago
`TabPFN`, Foundation Model for Tabular Data
50radar
TabPFNTabular ML model — fit/predict for classification and regression
Classification and regression run through a scikit-learn-style fit/predict flow. Useful for quick baselines on small structured datasets before building a full ML pipeline.
- TabPFN targets tabular data, not text or images; it fits churn, scoring, lead ranking, and internal ops data.
- The fit/predict interface lowers integration cost for Python stacks already using scikit-learn.
- TabPFN-2.6 was trained only on synthetic data, so production use still needs validation against real domain data.
Source: news.hada.io/topic?id=29719Read original →
FIG-0021:1
50radar
FIG-0021:1
#0003
#0003Other GeekNews20 hours ago
Mini Shai-Hulud Returns: 314 `npm` Packages Compromised
60radar
A short publish window still pushed hundreds of malicious versions. Lockfiles, token hygiene, and dependency review matter before the next npm install.
- The atool npm account was compromised on May 19, 2026, and malicious releases were pushed for about 22 minutes.
- Attack automation produced 637 malicious versions across roughly 317 packages. Short-lived incidents still reach CI fast.
- The payload was a 498KB obfuscated Bun script, matching scanner structure and regexes tied to Mini Shai-Hulud.
- Targets included cloud credentials such as AWS keys. Rotate exposed tokens and audit recent installs from affected packages.
Source: news.hada.io/topic?id=29709Read original →
FIG-0031:1
60radar
FIG-0031:1

`ik_llama.cpp` pushes `Qwen3.6 35B A3B` near 110 tok/s on 12GB VRAM

// related

Firefox 148 Starts Turning Off `asm.js` Optimization

`TabPFN`, Foundation Model for Tabular Data

Mini Shai-Hulud Returns: 314 `npm` Packages Compromised