Alert blasts through Slack. P99 latency on your critical API? Through the roof. No human stirring yet — but Relvy AI’s already traversing its runbook DAG, correlating that spike to a fresh deploy from 20 minutes ago.
That’s the pitch, anyway. Relvy AI promises automated on-call runbooks for engineering teams, ditching the hallucinatory mess of generic LLMs for something brutally deterministic. And here’s the thing: in a world where on-call burnout costs teams millions (PagerDuty’s own surveys peg it at $3.6 million per large org annually), this could be the pivot we’ve needed.
But.
Relvy doesn’t reinvent the wheel — they shatter it. Current LLMs like Claude 3.5 Sonnet or GPT-4o flop hard on root cause analysis, scraping by with under 40% accuracy on benchmarks. Why? Context overflow from terabytes of telemetry drowns them. No enterprise smarts to flag ‘normal’ cron-induced spikes. And that exploration drag? It torches your time-to-mitigation window.
Why Relvy AI Crushes the OpenRCA Problem
They anchor everything in a Runbook State Machine. Forget open-ended LLM chit-chat. Alerts trigger a DAG of diagnostic nodes — each a targeted tool call, not a prose poem.
Take their TelemetryTool. It spits Z-score anomalies or STL decomps, hands the agent clean JSON: {“anomalies_detected”: 3, “period”: “past 30m”}. No raw logs bloating the context window. Boom — token count plummets, hallucinations evaporate.
Then correlate_with_deployment? Grabs the last five commits. Structured truth, not vibes.
“By using these targeted tools, we reduce the token load significantly. The agent receives a structured JSON object describing the anomaly, which acts as a ‘ground truth’ anchor, preventing the hallucination of non-existent error patterns.”
Relvy’s own words — and they’re spot on. This isn’t AI hype; it’s engineering hygiene.
Local-first too. Docker/Helm deploys in your VPC. No telemetry exfil to the cloud. Datadog, Prometheus, Honeycomb? All fair game, zero latency.
Three threads hum in parallel: observation polls for anomalies, RAG-boosted reasoning matches signatures to runbooks, action layer fires CLI mitigations — rollbacks, restarts, traffic shifts.
Does Relvy Beat PagerDuty’s Human Handover?
PagerDuty? Solid for routing pings, but mitigation’s still you at 3 AM, bleary-eyed in New Tab. Relvy hands off only on low-confidence ambiguity, surfacing a notebook with cells logging every step: input data, agent thoughts, viz.
Example cell:
{ “step”: “Check Endpoint Latency”, “status”: “completed”, “data”: { “avg_latency”: “450ms”, “p99_latency”: “1200ms”, “anomaly_confidence”: true }, “agent_thought”: “P99 deviated 3.2 std devs from 7-day avg” }
Transparency kills the black-box fear. Engineers trust it because they can replay the tape.
Market dynamics scream opportunity. On-call tools hit $2B+ TAM, growing 15% YoY per Gartner. But pure AI plays like Microsoft’s Copilot for DevOps? Still generative fluff, prone to the same RCA pitfalls Relvy sidesteps.
My take: Relvy’s betting on determinism over dazzle, and that’s smart. Remember Jenkins in 2010? CI/CD was manual hell till scripted pipelines locked it down. Relvy does that for incidents — unique insight here — turning SRE from firefighting to orchestration.
Skeptical? Fair. Runbooks need constant tuning, or they ossify. Relvy’s DAGs must auto-evolve via RAG, or teams bail. But early adopters (they hint at Fortune 500 pilots) report 70% TTM cuts. If scales, PagerDuty stock dips 10-15% in 18 months — bold call, but data backs it.
The Hidden Gotcha in Relvy’s Stack
Enterprise context. LLMs blind to your quirks — Endpoint_A’s cron spike? Normal. B’s? Catastrophe. Relvy layers RAG over your docs, runbooks, past incidents. But onboarding? Non-trivial. Map your observability first, or it’s DOA.
Security latency nil, sure — VPC-bound. But that notebook? Audit gold, compliance dream for SOC2.
And the PR spin? None here. Relvy calls out LLM limits upfront. Refreshing in AI-land.
Zoom out: On-call’s a $4B drag yearly (Atlassian stats). Relvy targets 20-30% automation capture. Makes sense — if your stack’s mature.
Bootstrappers? Skip. Raw chaos needs humans.
Prediction: By Q4 2025, 15% of SRE teams run Relvy-like agents. Determinism wins.
Why Does This Matter for On-Call Teams?
Burnout’s real. 68% of engineers dread pager duty (PagerDuty State of Incident Report). Relvy offloads the rote — anomaly hunt, deploy corr, low-risk fixes.
Frees you for RCA deep dives, architecture fixes. Or sleep.
Numbers: Confidence >80%? Auto-mitigate. Else, human-in-loop with primed notebook. Hybrid heaven.
Competition? Incident.io, Firehydr — response-focused, light on autonomy. Blameless? Post-mortems. Relvy’s the executor.
One punchy caveat: High-cardinality data still bites. Their tools summarize, but edge cases (microservices soup) demand custom nodes.
Worth it? For scale-ups with observability hygiene, yes. Hype-free upgrade.
🧬 Related Insights
- Read more: Railway’s Laravel Dream Crashes on Production Rocks
- Read more: API Tooling’s Dirty Secret: Why It’s Still Living in 2014
Frequently Asked Questions
What is Relvy AI? Relvy AI automates on-call responses using runbook DAGs and tool interfaces, integrating with Datadog/Prometheus for deterministic incident mitigation.
Does Relvy AI replace on-call engineers? No — it handles routine diagnostics and fixes, escalating ambiguities with transparent notebooks for human review.
How much does Relvy AI cost? Pricing isn’t public yet; expect per-incident or seat-based, starting around $50/engineer/month based on similar tools.