telexed ~ c / 1a7d92ee-212radar:60 · agent_toolLIVE
← back
NO.
#1a7d92ee
Topic
AGENTS & TOOLS
Source
r/LocalLLaMA
Published
2026-05-02 11:21:25
Importance
★ 6/10 — radar 60

`LDR` pushes local deep research to **95.7%** `SimpleQA` on one `RTX 3090`

Agentic search, not closed-book recall, is doing the heavy lifting here. A fully local stack is now close to hosted deep-research scores, so private research workflows on prosumer GPUs look practical right now.

[ KEY POINTS ]
  1. The stack uses Ollama, qwen3.6:27b, and langgraph_agent with tool-calling, parallel subtopic splits, and up to 50 iterations; orchestration quality matters as much as model size.
  2. Reported scores are 95.7% on SimpleQA and 77.0% on xbench-DeepSearch, versus 91.2% / 59.0% for Qwen3.5-9B; newer Qwen gains show up strongly in tool-heavy loops.
  3. This is benchmarked with search enabled, so it competes more directly with Perplexity Deep Research and Tavily than with pure closed-book QA.
  4. Caveats are non-trivial: small sample sizes, self-grading noise, possible SimpleQA contamination, and a Chinese-language benchmark that may favor Qwen.
  5. LDR also adds practical infra: journal-quality grading via OpenAlex/DOAJ, per-user SQLCipher encryption, and zero telemetry.
Originalwww.reddit.com/r/LocalLLaMA/comments/1t1n6o8/we_are_finally_there_qwen3627b_agentic_search_957/Read original →

// related