Incident Report: Node.js Memory Leak Analysis
Date: 2026-01-20
Severity: P1 (Production Impact)
MTTR: 14 days
Root Cause: Event listener leak in WebSocket handler
Timeline of Events
Day 1 - Jan 6, 09:30 UTC
- Monitoring alerts: API instances restarting every 6 hours
- Memory usage shows sawtooth pattern (gradual climb, sudden drop)
- Initial hypothesis: Database connection leak
Day 3 - Jan 8
- Ruled out database connections (pool metrics normal)
- Added heap profiling to staging environment
- Identified EventEmitter instances growing unbounded
Day 7 - Jan 13
- Narrowed down to WebSocket message handlers
- Still unable to identify exact leak source
- Memory profiling shows 50K+ listener registrations
Day 14 - Jan 20, 14:15 UTC
- Root cause identified: Missing
removeListener()in disconnect handler - Fix deployed: 14:45 UTC
- Memory usage stabilized within 2 hours
Technical Analysis
Memory Leak Pattern
Heap Memory Usage Over 6 Hours
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
512MB ┤ ╱╱╱╱╱ [restart]
│ ╱╱
│ ╱╱
256MB ┤ ╱╱
│ ╱╱
128MB ┤ ╱╱
│ ╱╱
64MB ┤ ╱╱
└─────────────────────────────
0h 2h 4h 6h
Characteristics:
- Linear growth: ~13MB/hour
- No garbage collection effectiveness
- Crash at 512MB (container limit)
- Restart triggers temporary relief
Diagnostic Process
Step 1: Heap Snapshot Analysis
// Capture heap snapshots programmatically
const v8 = require('v8');
const fs = require('fs');
function captureHeapSnapshot() {
const filename = `heap-${Date.now()}.heapsnapshot`;
const snapshot = v8.writeHeapSnapshot(filename);
console.log(`Snapshot written to ${snapshot}`);
}
// Capture every 30 minutes
setInterval(captureHeapSnapshot, 30 * 60 * 1000);
Snapshot Comparison Results:
| Object Type | Snapshot 1 (2h) | Snapshot 2 (4h) | Growth |
|---|---|---|---|
| EventEmitter | 1,250 | 2,480 | +98% |
| Function | 3,100 | 6,210 | +100% |
| Array | 8,420 | 17,100 | +103% |
Finding: EventEmitter instances doubling every 2 hours
Step 2: Event Listener Tracking
// Instrument EventEmitter to track listener additions
const EventEmitter = require('events');
const originalOn = EventEmitter.prototype.on;
EventEmitter.prototype.on = function(event, listener) {
console.log(`[LISTENER ADD] ${event} on ${this.constructor.name}`);
console.trace();
return originalOn.call(this, event, listener);
};
Output Analysis:
[LISTENER ADD] message on WebSocketHandler
at WebSocketHandler.handleConnection (ws-handler.js:45)
at WebSocketServer.emit (events.js:310)
...
[Count after 1 hour]: 8,432 'message' listeners
[Count after 2 hours]: 16,891 'message' listeners
Conclusion: Listeners never removed on connection close
Step 3: Root Cause Identification
Problematic Code:
class WebSocketHandler {
handleConnection(socket) {
const messageHandler = (data) => {
this.processMessage(data, socket);
};
// ❌ Problem: Listener added
socket.on('message', messageHandler);
socket.on('close', () => {
// ❌ Problem: Listener NOT removed
console.log('Socket closed');
// Missing: socket.removeListener('message', messageHandler);
});
}
processMessage(data, socket) {
// Process message logic
}
}
Issue: Each new connection registers a message listener but never removes it on disconnect. With 50K connections/day, memory grows unbounded.
Solution Implemented
Fix Applied:
class WebSocketHandler {
handleConnection(socket) {
const messageHandler = (data) => {
this.processMessage(data, socket);
};
socket.on('message', messageHandler);
socket.on('close', () => {
// ✅ Fixed: Explicitly remove listener
socket.removeListener('message', messageHandler);
console.log('Socket closed, listener removed');
});
}
processMessage(data, socket) {
// Process message logic
}
}
Alternative Solutions Considered:
- Using
once()instead ofon()- Not applicable (need persistent listener) - Automatic cleanup with WeakMap - Complex, potential edge cases
- Connection pooling with max limits - Doesn’t address root cause
Verification Results
Memory Usage After Fix:
Heap Memory Usage (Stable)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
512MB ┤
│
│
256MB ┤
│ ════════════════════════ [stable ~128MB]
128MB ┤
│
64MB ┤
└─────────────────────────────
0h 12h 24h 48h 72h
Metrics Comparison:
| Metric | Before Fix | After Fix | Improvement |
|---|---|---|---|
| Memory baseline | 64MB → 512MB (growth) | 128MB (stable) | 75% reduction |
| Restart frequency | Every 6 hours | None (72h+) | 100% |
| Active listeners | 50K+ (growing) | ~150 (stable) | 99.7% |
| Connection capacity | Limited | Unlimited | N/A |
Diagnostic Tools & Techniques
Tool 1: Node.js Built-in Profiler
# Start with heap profiling enabled
node --inspect --heap-prof server.js
# Connect Chrome DevTools
chrome://inspect
Best For: Initial investigation, visual heap analysis
Tool 2: clinic.js
# Install
npm install -g clinic
# Profile with Doctor (identifies event loop issues)
clinic doctor -- node server.js
# Profile with HeapProfiler
clinic heapprofiler -- node server.js
Output: Generates detailed reports showing memory allocation hotspots
Tool 3: memwatch-next
const memwatch = require('@airbnb/node-memwatch-next');
memwatch.on('leak', (info) => {
console.error('Memory leak detected:');
console.error(info);
});
memwatch.on('stats', (stats) => {
console.log('GC stats:', {
num_full_gc: stats.num_full_gc,
num_inc_gc: stats.num_inc_gc,
heap_compactions: stats.heap_compactions,
estimated_base: stats.estimated_base,
current_base: stats.current_base,
min: stats.min,
max: stats.max
});
});
Best For: Production monitoring, early leak detection
Tool 4: Heap Diff Analysis
const memwatch = require('@airbnb/node-memwatch-next');
let hd;
memwatch.on('stats', () => {
if (!hd) {
hd = new memwatch.HeapDiff();
} else {
const diff = hd.end();
console.log('Heap diff:');
console.log(JSON.stringify(diff, null, 2));
hd = null;
}
});
Reveals: Objects growing between GC cycles
Prevention Strategies
Strategy 1: Automated Listener Audit
// audit-listeners.js
const EventEmitter = require('events');
function auditListeners(emitter, maxListeners = 10) {
const events = emitter.eventNames();
events.forEach(event => {
const count = emitter.listenerCount(event);
if (count > maxListeners) {
console.warn(`[LEAK WARNING] ${event}: ${count} listeners`);
console.warn('Stack trace:', new Error().stack);
}
});
}
// Run audit every 5 minutes
setInterval(() => {
auditListeners(myEventEmitter);
}, 5 * 60 * 1000);
Strategy 2: Connection Registry
class ConnectionManager {
constructor() {
this.connections = new Set();
}
register(socket) {
const handler = (data) => this.handleMessage(data, socket);
socket.on('message', handler);
socket.on('close', () => {
socket.removeListener('message', handler);
this.connections.delete(socket);
});
this.connections.add(socket);
}
handleMessage(data, socket) {
// Handle message
}
// Diagnostic method
getConnectionCount() {
return this.connections.size;
}
}
Strategy 3: Memory Monitoring Alerts
// monitoring.js
const v8 = require('v8');
function checkMemoryUsage() {
const heapStats = v8.getHeapStatistics();
const usedHeap = heapStats.used_heap_size;
const totalHeap = heapStats.heap_size_limit;
const usagePercent = (usedHeap / totalHeap) * 100;
if (usagePercent > 80) {
console.error('[CRITICAL] Memory usage at', usagePercent.toFixed(2), '%');
// Trigger alert to monitoring system
sendAlert({
severity: 'critical',
message: `High memory usage: ${usagePercent}%`,
metrics: heapStats
});
}
}
setInterval(checkMemoryUsage, 60 * 1000); // Check every minute
Lessons Learned
Event listeners are not garbage collected until explicitly removed
- Use
removeListener()oroff()in cleanup code - Consider using
once()for single-use listeners
- Use
Heap snapshots are invaluable for leak diagnosis
- Take snapshots at regular intervals
- Compare snapshots to identify growing objects
- Focus on objects with high retention count
Monitoring should include listener counts
- Track
EventEmitter.listenerCount()for critical emitters - Alert on unusual growth patterns
- Implement max listener limits
- Track
Automated testing can catch leaks early
- Write tests that simulate high connection volume
- Monitor memory usage during tests
- Fail tests if memory grows unexpectedly
Documentation is crucial
- Document cleanup requirements for event handlers
- Include lifecycle management in code reviews
- Create runbooks for common leak patterns
Recommendations
Immediate Actions:
- ✅ Deploy listener removal fix (COMPLETED)
- ✅ Add memory monitoring alerts (COMPLETED)
- ✅ Document WebSocket handler lifecycle (COMPLETED)
Short-term (1 week):
- Audit all EventEmitter usage across codebase
- Add automated listener count checks to CI
- Create memory leak runbook for on-call team
Long-term (1 month):
- Implement comprehensive memory testing in CI/CD
- Build dashboard for real-time listener monitoring
- Conduct team training on Node.js memory management
References
- Node.js EventEmitter documentation
- Chrome DevTools Memory Profiling
- clinic.js documentation
- Debugging Memory Leaks in Node.js
Report Compiled By: DevOps Team
Reviewed By: Engineering Lead
Status: Closed
Follow-up Date: 2026-02-20 (30-day review)