Multi-Source Research Agent with LangGraph & ThreadPoolExecutor

Forty-five seconds.

That’s how long it takes to pull research data from Wikipedia, arXiv, GitHub, Stack Overflow, Reddit, YouTube, Semantic Scholar, Hacker News, web search, and local documents—all at the same time. Five minutes ago, this same task took five minutes when run sequentially.

The difference? One developer decided that waiting for an LLM-powered research agent to query sources one after another was fundamentally broken, so they rebuilt it with parallel execution using LangGraph, ThreadPoolExecutor, and Ollama. No fancy distributed infrastructure. No cloud magic. Just smarter threading.

But here’s the thing—the technical solution is the easy part. The real lesson is in the architecture decisions, the self-correction loop, and the stuff that broke spectacularly before it worked.

The Sequential Trap (And Why It Kills Performance)

Most agent examples in the wild follow the same pattern:

Search web → process → search wiki → process → search arXiv → process → synthesize.

Each source takes 5–10 seconds minimum (network latency, LLM processing, parsing). Stack ten sources like that, and you’re looking at 50–100 seconds before synthesis even starts. Want to add error handling or retries? Now you’re creeping toward two, three, five minutes.

“If each source takes 5–10 seconds, a 10-source agent takes 50–100 seconds minimum — before synthesis.”

This is the architectural sin everyone commits. They think agentic workflows are inherently slow. They’re not. The workflow is slow because someone hardcoded it to wait.

So the fix is obvious—embarrassingly obvious, really—but almost nobody does it: run everything in parallel.

The Architecture That Actually Works

The agent’s flow looks like this:

Initialize → plan which sources matter → hit all of them at once → consolidate results → evaluate for gaps → either re-plan or finish.

This is implemented in LangGraph’s StateGraph. Each node is a function that receives the full AgentState and returns a partial update. The magic happens in the parallel_search_node.

Instead of looping through sources sequentially, the code spins up a ThreadPoolExecutor with one thread per source, submits all requests at once, then collects results as they arrive:

for source_name in plan:
    fn = source_functions.get(source_name)
    if fn:
        future = executor.submit(fn, state)
        futures_map[future] = source_name

for future in as_completed(futures_map):
    source_name = futures_map[future]
    try:
        result = future.result()
        combined.update(result)
    except Exception as e:
        logger.error(f"Source '{source_name}' failed: {e}")

Each source function is independent and writes to different state keys. So concurrent updates just merge in—no locking, no bottlenecks, no overhead.

YouTube is the exception (search has to finish before summarization can run), so it gets wrapped in a sequential function inside the parallel executor. Still fast, because it’s only one source doing the dance—the other nine are already done.

Total impact: ~5 minutes down to ~45 seconds on a decent internet connection. That’s not a rounding error improvement. That’s the difference between “this tool is useful” and “this tool is useless.”

Why Your Agent Keeps Asking the Wrong Questions

Here’s where it gets smarter. Before searching anything, the agent has the LLM decide which sources are actually relevant for the topic.

Don’t want to waste API calls querying GitHub for a philosophy essay? Don’t do it. The planning node asks: “For this research question, which sources matter?” The LLM picks from ten options based on the topic and a persona.

Personas. This is the clever bit. The agent can adopt five different research styles:

Generalist: Balanced across everything
Software Architect: Heavy on GitHub, Hacker News, Stack Overflow
Market Analyst: Web, Reddit, Hacker News
Scientific Reviewer: arXiv, Semantic Scholar
Product Manager: Web, Reddit, YouTube

Each persona weights sources differently. Ask the Generalist about quantum computing, and they query arXiv heavily. Ask the Product Manager the same question, and they ignore arXiv entirely and focus on what regular people are saying on Reddit. Different personas → different research plans → different parallel threads.

Then comes the evaluation loop. After gathering data, an evaluation node checks: “Do we have gaps?” If yes, and you haven’t hit the iteration limit, the agent re-plans and searches again. On niche topics, this second pass actually improves coverage noticeably—the LLM spots what’s missing and goes after it.

This is agentic reasoning without the hype. It’s feedback loops and conditional edges, not “AGI will read your mind.” Real utility.

What Broke Before This Worked

There’s a line in the original post that’s worth dwelling on: “nonlocal in threaded callbacks — I originally used nonlocal to capture results from threads. Race conditions appeared under load.”

That’s the sound of someone learning the hard way. Python’s nonlocal keyword can look elegant—you’re just sharing a variable across scopes. But under concurrent load, it’s a recipe for corrupted state. You’ll get race conditions that only show up sometimes, on some machines, under specific threading patterns. Debugging this is hellish.

The fix was a mutable container pattern (a dict that each thread updates). Not flashy. Not novel. Just… right.

There’s a lesson buried in that one sentence: the sexiest architecture isn’t always the most strong architecture. Sometimes boring thread-safety patterns exist because they work.

The LLM Backend Flexibility Nobody Talks About

The agent’s LLM abstraction is modular. The same code runs against local Ollama, Groq, Gemini, or OpenAI. Just swap environment variables. Reads from os.environ at call time (not import time), so Streamlit sidebar overrides work without restarting the server.

This matters because it means you can build this tool and never sign up for an API key. Run it locally on Ollama with a 1.5B model and get reasonable results in seconds. No cloud bills, no vendor lock-in, no rate-limiting surprises.

Is local Ollama as good as GPT-4? No. Does it work well enough for a research agent that hits multiple sources and self-corrects? Yes. And that’s where the real freedom lives.

Is This Actually Useful, or Just a Neat Demo?

There’s a working demo on Hugging Face, and the code is open source. So people can poke at it, fork it, yell at it.

But let’s be honest about what this tool is: it’s a research accelerator for people who know what they’re looking for but don’t want to bounce between ten tabs. It’s not going to replace a domain expert. It’s not going to discover novel insights. What it will do is gather relevant signals from multiple sources in 45 seconds instead of 5 minutes, flag gaps, and let you re-query if you want deeper coverage.

That’s useful. Not world-changing, but genuinely useful.

The real takeaway isn’t “look at this cool agent.” It’s that someone looked at a standard problem (sequential agent execution), spotted the bottleneck immediately, and fixed it with one architectural change (parallelization). No complex library. No custom orchestration. Just ThreadPoolExecutor and conditional edges.

Most developer tools are over-engineered. This one is exactly engineered. Which is why it’s fast.

🧬 Related Insights

Read more: A Developer Built a Useless Web App That Fights Back—And It’s Actually Brilliant
Read more: GitLab’s Package Repository Overhaul: What DevOps Teams Must Do Before September 2026

Frequently Asked Questions

Can I run this locally without paying for API access?

Yes. Use Ollama with a local model. You won’t get GPT-4 quality, but you’ll get working results in 45 seconds with zero cloud costs. The code checks for local Ollama first before falling back to OpenAI.

What happens if one source times out or fails?

The try-except block in parallel_search_node catches exceptions per source. If GitHub fails, the other nine sources still return results. The agent logs the failure and continues. One source doesn’t tank the whole research.

Does the re-planning loop actually improve results?

On niche topics, yes—noticeably. The evaluation node detects gaps and the LLM picks additional sources on the second pass. For broad topics, it usually finds everything on the first pass and skips re-planning entirely.

Multi-Source Research Agent with LangGraph & ThreadPoolExecutor

Key Takeaways

The Sequential Trap (And Why It Kills Performance)

The Architecture That Actually Works

Why Your Agent Keeps Asking the Wrong Questions

What Broke Before This Worked

The LLM Backend Flexibility Nobody Talks About

Is This Actually Useful, or Just a Neat Demo?

🧬 Related Insights

Frequently asked questions

Worth sharing?

⚡ Key Takeaways

The Sequential Trap (And Why It Kills Performance)

The Architecture That Actually Works

Why Your Agent Keeps Asking the Wrong Questions

What Broke Before This Worked

The LLM Backend Flexibility Nobody Talks About

Is This Actually Useful, or Just a Neat Demo?

🧬 Related Insights

Frequently asked questions

Share this article

Worth sharing?

Related Stories

Stay in the loop

Key Takeaways