Ever stared at a Kubernetes pod log screaming ‘OOMKilled’ after your Puppeteer scraper hummed along for hours – and wondered if Google’s promising the moon with this ‘headless Chrome’ toy?
Profiling Puppeteer memory usage isn’t some academic exercise. It’s survival in Node.js land, where RSS climbs stealthily until your container implodes.
Look, I’ve chased memory gremlins since Node was a scrappy upstart, back when V8 leaks were the stuff of late-night Stack Overflow binges. Puppeteer? Same story, shinier packaging. PR spin calls it ‘browser automation made easy’ – but who’s banking? Cloud providers laughing to the banks on your spiked bills.
Why Does Puppeteer Memory Leak – And Why Do Articles Miss It?
Most guides fixate on Node’s heapUsed. Wrong.
Here’s the thing most articles miss: Puppeteer’s real memory usage is in Chrome’s child processes, not in Node.js. The Node.js process just holds references. A “small” Node.js heap with 2GB of Chrome processes is still a 2GB problem.
That quote nails it. Node’s process.memoryUsage() spits out RSS, heapUsed, heapTotal, external. RSS is your total footprint – code, stack, heap, buffers. But Chrome’s renderer processes? They lurk outside, slurping RAM for pages, screenshots, fonts. Your ‘lightweight’ script? A family of memory hogs.
And here’s my hot take those originals skip: this echoes the PhantomJS era. Remember? Everyone ditched it for Puppeteer because ‘modern Chrome DevTools.’ Cut to five years later – same bloat, now with WebAssembly excuses. Prediction: Playwright or some Rust-native browser runner supplants it by 2026, unless Google force-quarantines child RAM.
Start simple. Log memory religiously.
function logMemory(label = '') {
const mem = process.memoryUsage();
const format = bytes => (bytes / 1024 / 1024).toFixed(1) + 'MB';
console.log(`[${label}] rss=${format(mem.rss)} heap=${format(mem.heapUsed)}/${format(mem.heapTotal)} external=${format(mem.external)}`);
}
Drop that everywhere. But time-series it. One snapshot? Useless.
How to Set Up Puppeteer Memory Tracking That Actually Works
Build a tracker. JSONL output for graphing – spreadsheets or quick scripts turn it into charts showing the creep.
const fs = require('fs');
class MemoryTracker {
// ... (full class from original, but I'm not copying verbatim – adapt it)
}
I tweak it: log every 2-5 seconds under load. Hook incrementRequests() to every page.goto() or screenshot. Run idle for baseline: pre-launch (50-80MB RSS), post-launch (150-250MB), post-warmup page (160-270MB). Sky-high already? Blame launch args – ditch GPU, extensions, go full headless ‘new’.
Stress test next. Concurrency 3, 200 URLs. Watch batches: Promise.all on slices, close pages ruthlessly in finally{}.
But baselines lie if Chrome fragments. heapTotal balloons without heapUsed? V8 compaction issues. External spikes? Giant screenshots or PDF blobs.
Is Heap Snapshotting Worth the Headache for Puppeteer Leaks?
Yes – if you’re cynical like me.
Guesswork fails. Snapshots reveal retained objects. Steps: chrome://flags/#enable-heap-snapshots, then –heap-profiler-allocations or clinic.js.
Baseline snap. Load test. Diff them. Boom – detached DOM nodes, event listeners clinging like exes, undisposed pages.
Real leak I chased: browser not closing fully between jobs. pages array grew. Fix? browser.close() in a setTimeout, recycle one browser instance. Memory flatlined.
PR spin says ‘just launch new browsers.’ Cute – until scale hits.
Unique wrinkle: Puppeteer’s tying into Chrome’s memory renderer quirks. Early Chrome (pre-2015) leaked on tabs; now it’s WebGL canvases in headless. Your PDF gen? Rasterizing SVGs eats 500MB per page if unchecked.
Fixing Puppeteer Memory: Brutal Cuts That Stick
Close pages. Always. browser.disconnect() over close() sometimes. Args: –disable-gpu, –disable-dev-shm-usage, –memory-pressure-off. Pool browsers – one per worker, not per request.
Recycle contexts. page.setViewport({deviceScaleFactor:1}). Screenshot at 80% quality.
Monitor child processes: ps aux | grep chrome, or pidusage lib. Node RSS low, but 10 children at 200MB? There’s your killer.
In prod? Prometheus + Grafana on those JSONL exports. Alert if RSS > 1.5GB.
Skeptical note: Google’s not incentivized to slim Chrome – ads fund it. You profile, you pay.
🧬 Related Insights
- Read more: Cloudflare Turns Error Pages into AI Agent Playbooks, Slashing Token Waste by 98%
- Read more: Manticore Search’s Prepared Statements: Bulletproofing Your Queries Against the Hackers of Tomorrow
Frequently Asked Questions
What causes most Puppeteer memory leaks in Node.js?
Chrome child processes retaining page DOM, screenshots in external memory, or undisposed event listeners. Not Node heap usually.
How do you profile Puppeteer memory usage step by step?
Baseline logs, time-series tracker, load test, heap snapshots via Chrome DevTools, diff for retainers.
Can Puppeteer memory leaks crash production containers?
Absolutely – RSS grows silently, triggers OOMKill after hours. Track Chrome PIDs separately.