EU AI Act Bans Untargeted Facial Scraping

What if your casual selfie becomes fodder for mass surveillance AI? The EU AI Act just banned untargeted facial scraping, echoing Clearview AI's GDPR fines—and promising more pain for data hoovers.

EU AI Act Draws Red Line on Scraping Your Face from the Web — theAIcatchup

Key Takeaways

  • EU AI Act Article 5(1)(e) bans untargeted facial scraping for recognition databases, echoing GDPR fines on Clearview AI.
  • Targeted scraping escapes the ban but still faces GDPR biometric hurdles.
  • Loopholes for deepfakes exist, but expect regulatory tightening amid privacy fears.

A Dutch startup’s servers hum, pulling millions of faces from public webcams and social feeds. No consent. No notice. Just raw data for the next big facial recognition beast.

Then — bam — EU regulators swoop in. That’s the future Article 5(1)(e) of the EU AI Act paints, banning untargeted scraping of facial images from the internet or CCTV to create or expand recognition databases. It’s not vague; it’s a red line, drawn thick.

Zoom out. This isn’t about the recognition itself — that’s Article 5(1)(h), with its own headaches. No, this targets the dirty prep work: hoovering up biometric gold without a care. The European Commission’s guidelines hammer it home, echoing Recital 43.

The untargeted scraping of facial images is a particularly intrusive practice which “adds to the feeling of mass surveillance and can lead to gross violations of fundamental rights, including the right to privacy”.

Chilling, right? And it’s no outlier. DPAs across Europe have been fining outfits like Clearview AI into oblivion under GDPR — €30.5 million from the Dutch alone in 2024.

Why Does Untargeted Scraping Freak Out Regulators?

But here’s the thing. Why single out “untargeted”? Targeted scraping — say, grabbing faces from a specific protest video for a narrow probe — slips the noose. It’s the mass, dragnet style that triggers the ban.

Think architecture. Untargeted means algorithms cast wide nets over the web or CCTV feeds, no human curation, no limits. Four boxes must tick: you’re placing on market, servicing, or using an AI system; it’s aimed at building/expanding facial rec databases; it deploys untargeted scraping; from internet or CCTV.

Miss one? You’re maybe safe. But companies love gray areas — and the Act closes them tight.

Scraping for deepfakes? Training generative AI on public faces to spit out fictional celebs? That’s outside scope, technically. No database for recognition. But — plot twist — GDPR and copyright still lurk. Imagine Disney suing over Mickey-fied Tom Hanks faces trained on scraped stars.

How Does This Rewrite AI Data Pipelines?

Companies built empires on firehoses of public data. LAION-5B, that beast behind Stable Diffusion? Loaded with faces. Now? Rethink everything.

My unique take: this mirrors the 1998 EU Data Protection Directive’s assault on early web trackers. Back then, cookies got tamed; today, face farms face extinction. Prediction? We’ll see a boom in synthetic faces — GANs churning infinite, consent-free avatars. Architectural shift: from scrape-first to generate-first training. It’s not hype; it’s survival.

Clearview’s saga proves it. Dutch, French, Greek DPAs piled on fines since 2022. Why? Scraping without basis, biometrics as special category data under GDPR Article 9. The AI Act layers on top, making it criminal territory.

Law Enforcement Directive? It allows targeted biometrics in probes, but untargeted? Nope. Interplay’s messy — AI Act prohibited practices don’t touch LED exceptions, but good luck defending in court.

Will the EU AI Act Ban Kill Innovation in Facial Tech?

Short answer: no. But it’ll hurt the sloppy players.

Targeted scraping for research? OK, if you jump GDPR hoops. Synthetic data gold rush? Already happening — think Midjourney dodging real faces. And enforcement? DPAs are battle-tested; expect waves of probes by 2026, when the Act bites.

Critique time. Tech giants spin this as overreach, but they’re the ones who normalized surveillance capitalism. EU’s not buying it — and neither should we.

Look, builders: audit your pipelines now. Is that web crawler facial-agnostic? Prove it. CCTV integrations? Document targeting. The Act’s not anti-AI; it’s anti-creep.

One punchy para: Compliance costs skyrocket.

Then sprawl: We’ve seen it before with GDPR — smaller firms folded, giants lawyered up. But here’s the why: facial biometrics aren’t just data; they’re identity. Scraping them untargeted erodes anonymity, fuels dystopias from China’s Uighur camps to American cop cams. EU’s betting privacy trumps convenience.

And enforcement teeth? Fines up to 7% global turnover. Ouch.

Shifts ahead. AI models lean synthetic. Databases shrink to consented pools. Recognition tech? Niche-ifies into medical scans, secure access — not mall cams.

EU AI Act vs. GDPR: Spot the Overlaps

Overlap city. Both hate untargeted biometrics. AI Act prohibits the system; GDPR fines the processing. Dual whammy.

Clearview redux: scraped 30 billion faces. Fined everywhere. Lesson? Don’t.


🧬 Related Insights

Frequently Asked Questions

What is untargeted scraping under EU AI Act?

It’s AI systems blindly pulling facial images from internet or CCTV without specific targeting, to build recognition databases. Targeted? Maybe OK.

Does EU AI Act ban all facial recognition?

No — just untargeted scraping for databases. Real-time remote recognition in public has separate rules.

Can I scrape faces for AI training?

If not for recognition databases, maybe — but GDPR/copyright still apply. Synthetic data’s safer.

James Kowalski
Written by

Investigative tech reporter focused on AI ethics, regulation, and societal impact.

Frequently asked questions

What is untargeted scraping under EU AI Act?
It's AI systems blindly pulling facial images from internet or CCTV without specific targeting, to build recognition databases. Targeted
Does EU AI Act ban all facial recognition?
No — just untargeted scraping for databases. Real-time remote recognition in public has separate rules.
Can I scrape faces for AI training?
If not for recognition databases, maybe — but GDPR/copyright still apply. Synthetic data's safer.

Worth sharing?

Get the best AI stories of the week in your inbox — no noise, no spam.

Originally reported by Future of Privacy Forum

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.