Our Node.js API was restarting every 6 hours due to memory leaks. Took me two weeks to find the bug. It was a single missing removeListener() call.
Here’s how I found it, and what I learned about debugging Node memory leaks that actually works.
The symptom
Memory usage graph looked like this:
Memory
│ ╱╱╱╱
│ ╱╱
│ ╱╱
└────────────> Time
Classic memory leak pattern. Process starts at 200MB, grows to 2GB over 6 hours, then OOM kills it. Kubernetes restarts it. Repeat.
Production logs were useless:
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
Thanks Node. Very helpful.
Step 1: Can you reproduce it?
First attempt: run the app locally, leave it running.
Nope. Memory stayed flat. The leak only happened under production load.
Second attempt: load testing.
# Threw 100 req/s at staging for 2 hours
k6 run load-test.js
Memory grew slowly but noticeably. Good. We can reproduce it.
Step 2: Heap snapshots
Node has built-in heap profiling. I learned to use it properly.
// app.js - Add heap snapshot endpoints
const v8 = require('v8');
const fs = require('fs');
app.get('/heap-snapshot', (req, res) => {
const filename = `heap-${Date.now()}.heapsnapshot`;
const snapshot = v8.writeHeapSnapshot(filename);
res.json({ snapshot });
});
DO NOT do this in production without auth. These files are 500MB+ and expose everything in memory.
Better approach:
const heapdump = require('heapdump');
// Automatic snapshots when memory is high
setInterval(() => {
const usage = process.memoryUsage();
if (usage.heapUsed > 1.5 * 1024 * 1024 * 1024) { // 1.5GB
heapdump.writeSnapshot((err, filename) => {
console.log('Heap snapshot written:', filename);
});
}
}, 60000); // Check every minute
Captured 3 snapshots:
- Startup: 180MB
- After 2 hours: 650MB
- After 4 hours: 1.2GB
Step 3: Analyzing heap snapshots
Opened them in Chrome DevTools:
- Load snapshot file in Chrome DevTools Memory profiler
- Switch to “Comparison” view
- Compare snapshot 1 → snapshot 2
What was growing?
Array: +45,000 objectsClosure: +12,000 objectsEventEmitter: +45,000 objects
Interesting. Arrays, closures, and event emitters all growing together. Event listeners maybe?
Step 4: The smoking gun
Chrome DevTools lets you inspect retained objects. Found this:
(EventEmitter) -> listeners array -> closure -> huge data object
45,000 event listeners, each holding a reference to a large object.
Searched codebase for on( and addEventListener. Found 87 places.
Started commenting them out one by one in staging. When I disabled WebSocket message handling, the leak stopped.
// websocket-handler.js
function handleWebSocketConnection(ws) {
ws.on('message', async (data) => {
const message = JSON.parse(data);
// Process message
await processMessage(message);
// Send response
ws.send(JSON.stringify({ status: 'ok' }));
});
ws.on('close', () => {
console.log('Client disconnected');
});
}
See the bug?
Step 5: The actual bug
Every WebSocket connection added a message listener. But we never removed it.
The close listener ran when clients disconnected, but it didn’t clean up the message listener. Node kept the listener around “just in case.”
Each listener’s closure captured the ws object, which held references to buffers, parsed data, and other heavy objects.
With 1000 clients connecting per hour and each staying 30 seconds, we were leaking ~15 listeners per minute.
The fix:
function handleWebSocketConnection(ws) {
const messageHandler = async (data) => {
const message = JSON.parse(data);
await processMessage(message);
ws.send(JSON.stringify({ status: 'ok' }));
};
const closeHandler = () => {
// Remove the message listener when connection closes
ws.removeListener('message', messageHandler);
ws.removeListener('close', closeHandler);
console.log('Client disconnected, listeners cleaned up');
};
ws.on('message', messageHandler);
ws.on('close', closeHandler);
}
Memory leak: gone.
What I learned about Node memory leaks
1. Global variables are evil
Found this in another file:
// Don't do this
global.activeUsers = [];
function trackUser(userId) {
global.activeUsers.push(userId);
// Never removed anywhere
}
global in Node is like window in browsers. It never gets garbage collected.
Fixed:
class UserTracker {
constructor() {
this.activeUsers = new Set();
}
add(userId) {
this.activeUsers.add(userId);
}
remove(userId) {
this.activeUsers.delete(userId);
}
}
const tracker = new UserTracker();
At least now it’s scoped and explicit.
2. Closures capture everything
This leaks:
function createHandler(data) {
const largeBuffer = Buffer.alloc(10 * 1024 * 1024); // 10MB
return function handler() {
console.log(data.id);
// largeBuffer is captured but never used
};
}
The closure captures the entire scope, including largeBuffer.
Fixed:
function createHandler(data) {
const dataId = data.id; // Extract what you need
return function handler() {
console.log(dataId);
// Only dataId is captured
};
}
3. Streams must be cleaned up
// Leaks
function processFile(filename) {
const stream = fs.createReadStream(filename);
stream.on('data', (chunk) => {
// Process chunk
});
}
If the stream errors or the function exits early, listeners stay attached.
Fixed:
function processFile(filename) {
const stream = fs.createReadStream(filename);
const cleanup = () => {
stream.removeAllListeners();
stream.destroy();
};
stream.on('data', (chunk) => {
// Process
});
stream.on('end', cleanup);
stream.on('error', (err) => {
console.error(err);
cleanup();
});
}
4. Cache without limits = memory leak
// Classic leak
const cache = {};
function getCached(key, fetchFn) {
if (!cache[key]) {
cache[key] = fetchFn();
}
return cache[key];
}
Cache grows forever. After a week, it’s gigabytes.
Use LRU cache:
const LRU = require('lru-cache');
const cache = new LRU({
max: 500, // Max 500 items
maxAge: 1000 * 60 * 60 // 1 hour
});
function getCached(key, fetchFn) {
let value = cache.get(key);
if (!value) {
value = fetchFn();
cache.set(key, value);
}
return value;
}
5. Monitor memory in production
Added this to all our services:
const memwatch = require('@airbnb/node-memwatch');
memwatch.on('leak', (info) => {
console.error('Memory leak detected:', info);
// Alert to Slack/PagerDuty
});
// Expose metrics
app.get('/metrics', (req, res) => {
const usage = process.memoryUsage();
res.json({
heapUsed: usage.heapUsed / 1024 / 1024, // MB
heapTotal: usage.heapTotal / 1024 / 1024,
external: usage.external / 1024 / 1024,
rss: usage.rss / 1024 / 1024
});
});
Scrape with Prometheus, alert when heap grows beyond threshold.
Tools that actually helped
Heap snapshots: Chrome DevTools Memory profiler
Leak detection: node-memwatch or @airbnb/node-memwatch
Profiling: clinic.js - clinic doctor -- node app.js
Load testing: k6 with realistic scenarios
The investigation checklist
When you suspect a memory leak:
- Can you reproduce it? Local, staging, or load test
- Capture heap snapshots at different memory levels
- Compare snapshots to see what’s growing
- Check event listeners - Most common leak source
- Look for global variables - Second most common
- Check closures - Are they capturing too much?
- Review caching logic - Any unbounded caches?
- Audit streams and timers - Are they cleaned up?
The aftermath
Two weeks of debugging for one line of code:
ws.removeListener('message', messageHandler);
Was it worth it? Yes. Our API hasn’t restarted unexpectedly in 3 months. Memory usage stays flat at 250MB.
Plus, I learned way more about Node internals than I wanted to.
Memory leaks are frustrating because they’re invisible until they’re catastrophic. But once you know the patterns, they’re usually simple fixes.
Just remember: anything you add a listener to, you need to remove it from.
That’s the lesson that cost us two weeks.