Open-Source RAG Pipeline: Swappable Plugins

Retrieval recall tanked 15% from one chunker swap. This new open-source RAG pipeline turns that debug hell into a plug-and-play dream — if it lives up to the hype.

Open-Source RAG Pipeline That Swaps Stages Like Lego Bricks — No More Rebuild Hell — theAIcatchup

Key Takeaways

  • Modular plugins for every RAG stage make debugging one change at a time, fixing the 'recall dropped 15%' black box problem.
  • Built-in BEIR evals give hard metrics like Recall@K, not vibes — includes image pipeline too.
  • Early but promising; echoes Unix composability, could standardize RAG prototyping if it matures.

Fixed-size chunks. Snip. Retrieval recall plummets 15%.

That’s where coldoven found himself, staring at end-to-end evals screaming “worse” without a clue which domino fell first.

And here’s the open-source RAG pipeline that fixes it: every stage — from docs ingestion to PII redaction, chunking, deduping, embedding, indexing, retrieval — bolts on as an independent plugin. No rewiring the whole chain. Just tweak the feature string: results = mlodaAPI.run_all( features=["docs__pii_redacted__chunked__deduped__embedded"] ). Skip dedup? Drop it: "docs__pii_redacted__chunked__embedded". Add eval midstream? "docs__pii_redacted__chunked__embedded_evaluation".

Mloda-ai’s rag_integration (github.com/mloda-ai/rag_integration) isn’t just another RAG tool. It’s RAG’s Unix moment — remember how Unix pipes let you chain grep|sort|uniq without recompiling the kernel? This does that for retrieval-augmented generation, isolating stages so debugging stops being black magic.

Why Your RAG Builds Crumble Under Tweaks

Change one thing in most pipelines — say, from fixed-size to sentence-aware chunking — and suddenly you’re re-embedding everything, re-indexing, re-retrieving. Hours vanish. Eval spits vibes, not vectors.

“I swapped a chunker from fixed-size to sentence-based, and retrieval recall dropped 15%. End-to-end eval just told me ‘it’s worse.’ Not helpful.”

Coldoven’s frustration? Universal. RAG’s stacked too many assumptions: chunkers assume uniform docs, embedders fight noise from bad splits, retrievers choke on dupes. One weak link tanks the chain. But why? Architecture. Most pipelines glue stages monolithically — Python scripts or Airflows that cascade failures. You can’t probe mid-pipeline without hacks.

This one? Named stages. Each plugin owns its input/output schema. Swap chunkers, eval right there. Recall@K on BEIR’s SciFact benchmark spits numbers: Precision, NDCG, MAP. No vibes.

Short para for punch: It’s eval-at-every-step.

Now zoom to images — yeah, it handles those too. Preprocess, redact PII (blur, pixelate), perceptual hash for dedup, CLIP embeds. Same modular glory.

How Swappable Plugins Kill the RAG Debug Hell

Picture the flow: raw docs → pii_redacted → chunked → deduped → embedded → indexed → retrieved. Each arrow? A plugin boundary. MlodaAPI chains them by name, caches intermediates if you want.

Why does this matter architecturally? RAG’s exploded — LangChain, LlamaIndex ship abstractions, but they’re opinionated black boxes. Swap an embedder? Rebuild the vector store. Here, plugins register via simple interfaces (likely ABCs or duck-typing). Eval hooks in anywhere, benchmarking against gold-standard BEIR.

I dug the repo. Plugins live in folders: chunkers (fixed, semantic, recursive), redactors (NER-based), dedupers (minhash). Embedders? OpenAI, HuggingFace, whatever. It’s not fully baked — authors admit some bits WIP — but the skeleton shines. Run mloda eval on subsets, isolate faults.

Bold prediction: this spawns a plugin ecosystem. Imagine community chunkers tuned for legal docs, or redactors that anonymize without killing context. Like npm for RAG. (Corporate hype alert: no VC spin here; it’s a Reddit Show & Tell from coldoven, raw and seeking feedback.)

But wait — historical parallel. Back in ’70s, Unix ditched monolithic editors for pipes: small tools, composable. RAG’s at that fork: bloated frameworks vs. Lego blocks. Mloda picks Lego. Why now? LLMs commoditize generation; retrieval’s the moat. Tune it wrong, your agent’s dumb.

Is Mloda’s Open-Source RAG Pipeline Production-Ready?

Not entirely. Repo notes: “Not everything presented here is working yet.” Image pipeline’s solidifying, evals cover text well (SciFact’s scientific claims), but scale? Unproven. No distributed indexing mentioned — for 1M docs, you’d bolt Pinecone or Weaviate externally.

Still, for prototyping? Gold.

Teams waste weeks on “why retrieval sucks.” This isolates to hours. Skepticism check: is it truly zero-touch swaps? Code suggests yes — feature strings orchestrate, plugins autoload. But edge cases (schema mismatches) could bite.

Deeper why: RAG’s fragility stems from data heterogeneity. Docs vary — PDFs, code, chats. Plugins let you mix: sentence-chunk Markdown, fixed for tables. Eval quantifies: NDCG rewards ranking, not just recall.

One nit: BEIR’s narrow (SciFact). Broader benchmarks incoming? Community could fork, add TREC-COVID or NFCorpus. That’s open-source magic.

Why Does This Matter for RAG Builders?

You’re building agentic workflows. RAG’s core. Without modularity, iteration crawls. This accelerates 10x — my estimate, from similar pains.

Critique: PR’s humble (“figuring out if interesting”), smart move. No “revolutionary” fluff. Just code.

Unique insight — beyond the post: this mirrors containerization’s rise. Docker swapped monoliths for swappable images; here, stages are RAG’s containers. Prediction: by 2025, 50% of prod RAG runs modular like this, as eval costs plummet.

Grab it. Fork. Break it.


🧬 Related Insights

Frequently Asked Questions

What is an open-source RAG pipeline?

It’s a modular system for retrieval-augmented generation, processing docs through stages like chunking and embedding, all open-source and tweakable.

How do you swap plugins in mloda-ai rag_integration?

Edit the feature string in mlodaAPI.run_all(), like “docs__chunked__embedded” — drops unwanted stages instantly.

Is mloda RAG pipeline ready for production?

Core text pipeline works; images WIP. Great for dev, scale with external stores.

Sarah Chen
Written by

AI research editor covering LLMs, benchmarks, and the race between frontier labs. Previously at MIT CSAIL.

Frequently asked questions

What is an open-source RAG pipeline?
It's a modular system for retrieval-augmented generation, processing docs through stages like chunking and embedding, all open-source and tweakable.
How do you swap plugins in mloda-ai rag_integration?
Edit the feature string in mlodaAPI.run_all(), like "docs__chunked__embedded" — drops unwanted stages instantly.
Is mloda RAG pipeline ready for production?
Core text pipeline works; images WIP. Great for dev, scale with external stores.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Reddit r/opensource

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.