5 OAuth2 Bugs Found with MCP Tool First Try

Spec-compliant OAuth2 server. Clean ZAP scan. Then: five bugs in ten minutes flat, courtesy of an MCP security workbench. Security just got a wake-up call.

5 OAuth2 Vulnerabilities Exposed in Minutes by New MCP Security Tool — theAIcatchup

Key Takeaways

  • Spec compliance and ZAP scans miss deep OAuth2 flow vulnerabilities—AI-MCP tools expose them fast.
  • go-appsec/toolbox + Claude Code found 5 bugs in 10 minutes, no pentest experience needed.
  • Rise of protocol-aware AI testing could slash appsec costs, disrupt $2B market.

Five vulnerabilities. High-severity included. All unearthed in a single 10-minute session using a novel MCP security tool—no prior pentesting chops required.

That’s the stark reality for Autentico, a Go-based OAuth 2.0 and OpenID Connect identity provider its creator thought was bulletproof. Passed OpenID Foundation conformance. Nailed OWASP ZAP scans across 169 endpoints. Reviewed 10 RFCs down to every MUST and SHOULD. Yet go-appsec/toolbox, wired to Claude Code, spotted flaws traditional tools missed.

Look, OAuth2 security testing has long been a black art—flows twist through redirects, tokens dance in JWTs, clients authenticate in shadowy ways. Market data backs the peril: Verizon’s 2024 DBIR pegs misconfigurations in 15% of breaches, with API flaws spiking 30% year-over-year. Autentico’s saga? A microcosm of why spec adherence alone won’t cut it.

Why Did ZAP Miss These OAuth2 Flaws?

OWASP ZAP delivered a clean-ish report: zero fails, 112 passes, four warns. Fixed headers, 500s swapped for 404s. But ZAP’s an outsider—probes headers, injections, basics. It can’t grok OAuth’s innards: PKCE enforcement? Refresh rotation? Token introspection auth?

No. For that, you need eyes inside the flows. Enter go-appsec/toolbox, an MCP server turning browsers into collaborative pentest arenas. You drive; AI proxies traffic, wields tools like proxy_poll, replay_send, jwt_decode. Setup? Minutes. Browser proxies to port 8080, Claude connects via MCP add. Boom—112 flows captured: auth code, token swaps, MFA, admin CRUD.

First hit: the /oauth2/introspect endpoint. Wide open.

The /oauth2/introspect endpoint returned full token metadata (active status, scopes, user ID, and claims) without requiring any client credentials. Anyone who had a token value could check whether it was active and extract its claims.

AI stripped creds from a legit request, replayed. 200 OK, claims spilled. High-severity leak—fixed mid-session. That’s the tool’s edge: contextual replay, not blind fuzzing.

But wait—two mediums, two lows piled on. PKCE? Not enforced for public clients. Replay sans code_challenge? Accepted. Refresh tokens? Reused twice, both good. CSRF error? Leaked env vars. Stored XSS in client_name loomed (cut short in reports).

Is MCP the Future of AppSec Testing?

Here’s my sharp take: yes, but with caveats. Traditional scanners like ZAP or Burp handle the low-hanging fruit—80% of CVEs maybe. The rest? Requires human-AI symbiosis. MCP (Model Context Protocol) flips the script: AI isn’t guessing; it’s tooled up, watching your exact traffic.

Market dynamics scream opportunity. Gartner forecasts AI-driven security testing to hit $5B by 2028, up from peanuts. Tools like this undercut $10K/pentest gigs—developers self-audit at Claude’s $20/month. Autentico’s dev, zero experience, bagged five bugs. Scale that: OSS projects, startups dodging audits.

Critique time. Corporate hype calls these “AI scanners.” Wrong. Toolbox isn’t autonomous; it’s a workbench. You trigger flows—login, MFA enroll. AI suggests, executes via tools (oast_create for OOB, cookie_jar for state). Claude iterated: capture, hypothesize, replay, verify. Raw power, but demands a human pilot.

And the unique angle you won’t read elsewhere: this echoes Heartbleed’s 2014 wake-up. OpenSSL “compliant,” battle-tested—yet a buffer overread slipped through. Autentico? Spec-passing, ZAP-clean, MCP-mauled. Prediction: by 2026, 70% of API providers will mandate flow-aware tools like MCP, per my read of rising breach costs ($4.5M average, IBM).

Deeper dive on fixes. Introspect got client auth guards (RFC 7009 §2.1 style). PKCE? Enforced for publics. Refresh? Rotate on use (RFC 6749 §10.4). CSRF? Sanitized errors. XSS? Escaped inputs. One PR, ship it. Confidence restored—but humbler.

Skeptics say: sample of one. Fair. But replay the dynamics. OAuth2’s 1.2M GitHub repos, per 2024 search. How many pass conformance yet leak like this? Toolbox’s open-source (go-appsec GitHub), Claude-compatible. Devs, grab it.

Broader implications. Identity providers guard the keys—Auth0, Keycloak users, take note. Even giants falter: Okta’s 2022 breach via stolen sessions. MCP scales to them: proxy prod-like traffic, AI probes.

One-paragraph punch: Tools evolve. From static SAST (Coverity’s 2000s era) to DAST (ZAP 2010s) to now AI-flow testing. Lag, and your “secure” IdP becomes tomorrow’s headline.

How Does go-appsec/toolbox Stack Against Burp?

Burp Suite: $400/year, manual macros for OAuth. Expert-only. Toolbox: free OSS, AI automates replays. Burp pros collaborate via teams; here, human-AI pair at lightspeed. Downside? Early, Claude-dependent. Upshot: disrupts the $2B appsec market.

Real-world velocity. Ten minutes, five bugs. ZAP? Hours scanning. Pentester? Days quoting.

Wander a sec—remember Log4Shell? Scanners whiffed; custom fuzzers won. MCP’s that leap: protocol-aware, agentic.


🧬 Related Insights

Frequently Asked Questions

What is go-appsec/toolbox and how does it find OAuth2 bugs?

It’s an MCP server for AI-human security testing—you proxy browser traffic, AI uses tools like replay_send to mutate and probe OAuth flows, spotting misses like unauthed introspection.

Is spec compliance enough for building secure OAuth2 providers?

No—Autentico passed OpenID tests and ZAP, but MCP revealed five flaws. Compliance ticks boxes; flow-aware testing catches real leaks.

Will AI tools like this replace professional pentesters?

Not yet—they augment. Zero-experience dev found bugs, but pros scale to complex envs. Expect hybrid teams dominating by 2027.

Sarah Chen
Written by

AI research editor covering LLMs, benchmarks, and the race between frontier labs. Previously at MIT CSAIL.

Frequently asked questions

What is go-appsec/toolbox and how does it find OAuth2 bugs?
It's an MCP server for AI-human security testing—you proxy browser traffic, AI uses tools like replay_send to mutate and probe OAuth flows, spotting misses like unauthed introspection.
Is spec compliance enough for building secure OAuth2 providers?
No—Autentico passed OpenID tests and ZAP, but MCP revealed five flaws. Compliance ticks boxes; flow-aware testing catches real leaks.
Will AI tools like this replace professional pentesters?
Not yet—they augment. Zero-experience dev found bugs, but pros scale to complex envs. Expect hybrid teams dominating by 2027.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.