GTIG AI Threat Tracker: Distillation Attacks Rise

Threat actors aren't just using AI; they're stealing it. Google's GTIG details a wave of distillation attacks and new AI-malware hybrids that could reshape cyber ops.

Google GTIG's Latest: AI Distillation Attacks Spike as Hackers Clone Models and Build Smarter Malware — theAIcatchup

Key Takeaways

  • Model distillation attacks surged in 2025, mainly from private entities—not APTs yet.
  • DPRK, Iran, PRC, Russia use LLMs for faster recon and phishing; new malware like HONESTCUE integrates APIs.
  • Google disrupts via bans and hardening, but underground jailbreak services signal growing ecosystem.

Everyone figured AI would supercharge defenders first—faster threat detection, smarter anomaly spotting. But Google’s Threat Intelligence Group (GTIG) just flipped the script in their Q4 2025 AI Threat Tracker. Attackers from DPRK to Russia are distilling proprietary models, weaving LLMs into phishing lures, even baking Gemini APIs into malware. It’s not revolutionizing the game. Yet. But the productivity boost? That’s real, and it’s tilting the field.

GTIG’s report—update to their November drop—spotlights model extraction attacks, aka distillation. Bad guys probe APIs legally, siphon logic, train clones. No APTs hit frontier models directly, Google insists. Private firms, researchers? They’re all over it, worldwide.

Here’s the thing.

What Exactly Are Distillation Attacks—and Why Do They Matter?

Model extraction attacks happen when adversaries poke a live LLM with queries, harvest outputs, distill that into their own model. Knowledge distillation, they call it—stealing smarts without cracking servers. Historically, IP theft meant intrusions, data exfil. Now? API access suffices for service-based AI.

GTIG caught plenty in 2025. Disrupted them. Disabled projects, accounts. Beefed up classifiers. But the trend? Upward. And it’s not just theory.

“Google DeepMind and GTIG have identified an increase in model extraction attempts or ‘distillation attacks,’ a method of intellectual property theft that violates Google’s terms of service.”

That’s straight from the exec summary. Chilling, right? Violates ToS, sure—but enforcement lags behind creativity.

Threat actors aren’t waiting. DPRK groups lean on LLMs for tech research, targeting. Iran crews craft nuanced phishing. PRC and Russian ops streamline recon. Real-world cases: AI-augmented phishing builds rapport faster—personalized lures that dodge filters.

One punchy example. HONESTCUE malware family. It taps Gemini’s API to spit code, downloads second-stage payloads. Experimentation, yes. But effective.

And underground? Xanthorox services peddle ‘independent’ models—jailbroken commercial APIs, open-source MCP servers underneath. A shadow ecosystem blooming.

Why Haven’t APTs Broken Through Yet?

Google’s line: No fundamental shifts. No breakthrough capabilities from APTs or IO actors. Classifiers hold. Gemini’s safeguards advance—see their white paper.

But let’s cut the spin. This reeks of early denial. Remember 2010? Stuxnet creators toyed with novel exploits; no one saw nation-state cyber as ‘game-changing’ till it hit. Today, agentic AI glimmers on the horizon—threat actors eyeing autonomous malware devs. Productivity gains compound. Recon drops from weeks to hours. Phishing success? Skyrockets.

My take: Google’s downplaying masks momentum. They’re disrupting—good. Sharing best practices? Noble. But private sector distillation proves the method works. APTs will adapt. Bold prediction: By mid-2026, we’ll see cloned frontier models in wild ops, fueling untraceable campaigns.

Short para for emphasis: Defenders, wake up.

How Nation-States Are Really Using AI Right Now

DPRK: LLMs for research, targeting intel. Rapid phishing generation—nuanced, culture-specific hooks.

Iran: Streamlining social engineering. Rapport-building emails that feel human.

PRC, Russia: Recon acceleration. Public data scraping, vulnerability hunting via prompts.

Not transformative. Incremental. But stacks up. GTIG disrupted campaigns in the wild—saw the fingerprints.

Agentic AI next? Actors tinkering with tooling. Malware that self-evolves code. Early days, but interest spikes.

Google’s countermeasures: Account bans, model hardening. Proactive. They’ve mitigated private extractions globally. No APT direct hits on Gemini et al.

Still.

This changes dynamics subtly but surely. Attackers get faster, cheaper. Defenders chase. Market ripple? AI security vendors boom—expect M&A frenzy. Tools for API monitoring, distillation detection. Bloomberg-style bet: Firms like CrowdStrike, SentinelOne pivot hard here, stocks pop 20% on threat intel like this.

Underground jailbreaks add fuel. Xanthorox claims independence—lies. It’s proxied jailbroken APIs. Scalable misuse.

GTIG arms us with IOCs, proofs-of-concept. Anticipate, they say. Smart.

But here’s my unique angle, absent in the report: Echoes of the Morris Worm era. 1988, first internet worm—experimentation by a researcher turned catastrophe. Today’s distillation? Same vibe. Playful probes today, weaponized swarms tomorrow. History doesn’t repeat; it distills.

Google’s Defenses: Holding or Placeholder?

They disable bad projects. Strengthen models. Share intel.

“At Google, we are committed to developing AI boldly and responsibly, which means taking proactive steps to disrupt malicious activity by disabling the projects and accounts associated with bad actors.”

Responsible, sure. But as APIs proliferate—Claude, GPT, Llama—attack surface explodes. One weak link, and distilled models spread.

Private entities lead the extraction charge. Researchers clone for ‘study.’ Blurs lines. GTIG calls it out.

Wrapping the data: No apocalypse. Productivity edges to attackers. Defenders counter. But tilt favors offense long-term unless models lock down harder.


🧬 Related Insights

Frequently Asked Questions

What are model extraction attacks in AI? Short answer: Hackers query an AI model via API, steal its reasoning patterns, train a copy. No break-ins needed.

How are threat actors like DPRK using LLMs? For recon, targeting, phishing lures. Speeds up ops without new breakthroughs.

Is Google’s Gemini safe from these AI threats? GTIG disrupted attacks; safeguards improving. But distillation attempts rose—no guarantees.

Sarah Chen
Written by

AI research editor covering LLMs, benchmarks, and the race between frontier labs. Previously at MIT CSAIL.

Frequently asked questions

What are model extraction attacks in AI?
Short answer: Hackers query an AI model via API, steal its reasoning patterns, train a copy. No break-ins needed.
How are threat actors like DPRK using LLMs?
For recon, targeting, phishing lures. Speeds up ops without new breakthroughs.
Is Google's Gemini safe from these AI threats?
GTIG disrupted attacks; safeguards improving. But distillation attempts rose—no guarantees.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Mandiant Blog

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.