`LDR` pushes local deep research to **95.7%** `SimpleQA` on one `RTX 3090`
Agentic search, not closed-book recall, is doing the heavy lifting here. A fully local stack is now close to hosted deep-research scores, so private research workflows on prosumer GPUs look practical right now.
- The stack uses
Ollama,qwen3.6:27b, andlanggraph_agentwith tool-calling, parallel subtopic splits, and up to 50 iterations; orchestration quality matters as much as model size. - Reported scores are 95.7% on
SimpleQAand 77.0% onxbench-DeepSearch, versus 91.2% / 59.0% forQwen3.5-9B; newer Qwen gains show up strongly in tool-heavy loops. - This is benchmarked with search enabled, so it competes more directly with
Perplexity Deep ResearchandTavilythan with pure closed-book QA. - Caveats are non-trivial: small sample sizes, self-grading noise, possible
SimpleQAcontamination, and a Chinese-language benchmark that may favor Qwen. LDRalso adds practical infra: journal-quality grading viaOpenAlex/DOAJ, per-userSQLCipherencryption, and zero telemetry.