Incident Report: Node.js Memory Leak Analysis

Date: 2026-01-20
Severity: P1 (Production Impact)
MTTR: 14 days
Root Cause: Event listener leak in WebSocket handler


Timeline of Events

Day 1 - Jan 6, 09:30 UTC

  • Monitoring alerts: API instances restarting every 6 hours
  • Memory usage shows sawtooth pattern (gradual climb, sudden drop)
  • Initial hypothesis: Database connection leak

Day 3 - Jan 8

  • Ruled out database connections (pool metrics normal)
  • Added heap profiling to staging environment
  • Identified EventEmitter instances growing unbounded

Day 7 - Jan 13

  • Narrowed down to WebSocket message handlers
  • Still unable to identify exact leak source
  • Memory profiling shows 50K+ listener registrations

Day 14 - Jan 20, 14:15 UTC

  • Root cause identified: Missing removeListener() in disconnect handler
  • Fix deployed: 14:45 UTC
  • Memory usage stabilized within 2 hours

Technical Analysis

Memory Leak Pattern

Heap Memory Usage Over 6 Hours
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
512MB ┤               ╱╱╱╱╱ [restart]
      │             ╱╱
      │           ╱╱
256MB ┤         ╱╱
      │       ╱╱
128MB ┤     ╱╱
      │   ╱╱
 64MB ┤ ╱╱
      └─────────────────────────────
       0h   2h   4h   6h

Characteristics:

  • Linear growth: ~13MB/hour
  • No garbage collection effectiveness
  • Crash at 512MB (container limit)
  • Restart triggers temporary relief

Diagnostic Process

Step 1: Heap Snapshot Analysis

// Capture heap snapshots programmatically
const v8 = require('v8');
const fs = require('fs');

function captureHeapSnapshot() {
  const filename = `heap-${Date.now()}.heapsnapshot`;
  const snapshot = v8.writeHeapSnapshot(filename);
  console.log(`Snapshot written to ${snapshot}`);
}

// Capture every 30 minutes
setInterval(captureHeapSnapshot, 30 * 60 * 1000);

Snapshot Comparison Results:

Object TypeSnapshot 1 (2h)Snapshot 2 (4h)Growth
EventEmitter1,2502,480+98%
Function3,1006,210+100%
Array8,42017,100+103%

Finding: EventEmitter instances doubling every 2 hours

Step 2: Event Listener Tracking

// Instrument EventEmitter to track listener additions
const EventEmitter = require('events');
const originalOn = EventEmitter.prototype.on;

EventEmitter.prototype.on = function(event, listener) {
  console.log(`[LISTENER ADD] ${event} on ${this.constructor.name}`);
  console.trace();
  return originalOn.call(this, event, listener);
};

Output Analysis:

[LISTENER ADD] message on WebSocketHandler
  at WebSocketHandler.handleConnection (ws-handler.js:45)
  at WebSocketServer.emit (events.js:310)
  ...
  
[Count after 1 hour]: 8,432 'message' listeners
[Count after 2 hours]: 16,891 'message' listeners

Conclusion: Listeners never removed on connection close

Step 3: Root Cause Identification

Problematic Code:

class WebSocketHandler {
  handleConnection(socket) {
    const messageHandler = (data) => {
      this.processMessage(data, socket);
    };
    
    // ❌ Problem: Listener added
    socket.on('message', messageHandler);
    
    socket.on('close', () => {
      // ❌ Problem: Listener NOT removed
      console.log('Socket closed');
      // Missing: socket.removeListener('message', messageHandler);
    });
  }
  
  processMessage(data, socket) {
    // Process message logic
  }
}

Issue: Each new connection registers a message listener but never removes it on disconnect. With 50K connections/day, memory grows unbounded.


Solution Implemented

Fix Applied:

class WebSocketHandler {
  handleConnection(socket) {
    const messageHandler = (data) => {
      this.processMessage(data, socket);
    };
    
    socket.on('message', messageHandler);
    
    socket.on('close', () => {
      // ✅ Fixed: Explicitly remove listener
      socket.removeListener('message', messageHandler);
      console.log('Socket closed, listener removed');
    });
  }
  
  processMessage(data, socket) {
    // Process message logic
  }
}

Alternative Solutions Considered:

  1. Using once() instead of on() - Not applicable (need persistent listener)
  2. Automatic cleanup with WeakMap - Complex, potential edge cases
  3. Connection pooling with max limits - Doesn’t address root cause

Verification Results

Memory Usage After Fix:

Heap Memory Usage (Stable)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
512MB ┤
      │
      │
256MB ┤
      │  ════════════════════════ [stable ~128MB]
128MB ┤  
      │
 64MB ┤
      └─────────────────────────────
       0h   12h  24h  48h  72h

Metrics Comparison:

MetricBefore FixAfter FixImprovement
Memory baseline64MB → 512MB (growth)128MB (stable)75% reduction
Restart frequencyEvery 6 hoursNone (72h+)100%
Active listeners50K+ (growing)~150 (stable)99.7%
Connection capacityLimitedUnlimitedN/A

Diagnostic Tools & Techniques

Tool 1: Node.js Built-in Profiler

# Start with heap profiling enabled
node --inspect --heap-prof server.js

# Connect Chrome DevTools
chrome://inspect

Best For: Initial investigation, visual heap analysis

Tool 2: clinic.js

# Install
npm install -g clinic

# Profile with Doctor (identifies event loop issues)
clinic doctor -- node server.js

# Profile with HeapProfiler
clinic heapprofiler -- node server.js

Output: Generates detailed reports showing memory allocation hotspots

Tool 3: memwatch-next

const memwatch = require('@airbnb/node-memwatch-next');

memwatch.on('leak', (info) => {
  console.error('Memory leak detected:');
  console.error(info);
});

memwatch.on('stats', (stats) => {
  console.log('GC stats:', {
    num_full_gc: stats.num_full_gc,
    num_inc_gc: stats.num_inc_gc,
    heap_compactions: stats.heap_compactions,
    estimated_base: stats.estimated_base,
    current_base: stats.current_base,
    min: stats.min,
    max: stats.max
  });
});

Best For: Production monitoring, early leak detection

Tool 4: Heap Diff Analysis

const memwatch = require('@airbnb/node-memwatch-next');

let hd;
memwatch.on('stats', () => {
  if (!hd) {
    hd = new memwatch.HeapDiff();
  } else {
    const diff = hd.end();
    console.log('Heap diff:');
    console.log(JSON.stringify(diff, null, 2));
    hd = null;
  }
});

Reveals: Objects growing between GC cycles


Prevention Strategies

Strategy 1: Automated Listener Audit

// audit-listeners.js
const EventEmitter = require('events');

function auditListeners(emitter, maxListeners = 10) {
  const events = emitter.eventNames();
  
  events.forEach(event => {
    const count = emitter.listenerCount(event);
    if (count > maxListeners) {
      console.warn(`[LEAK WARNING] ${event}: ${count} listeners`);
      console.warn('Stack trace:', new Error().stack);
    }
  });
}

// Run audit every 5 minutes
setInterval(() => {
  auditListeners(myEventEmitter);
}, 5 * 60 * 1000);

Strategy 2: Connection Registry

class ConnectionManager {
  constructor() {
    this.connections = new Set();
  }
  
  register(socket) {
    const handler = (data) => this.handleMessage(data, socket);
    
    socket.on('message', handler);
    socket.on('close', () => {
      socket.removeListener('message', handler);
      this.connections.delete(socket);
    });
    
    this.connections.add(socket);
  }
  
  handleMessage(data, socket) {
    // Handle message
  }
  
  // Diagnostic method
  getConnectionCount() {
    return this.connections.size;
  }
}

Strategy 3: Memory Monitoring Alerts

// monitoring.js
const v8 = require('v8');

function checkMemoryUsage() {
  const heapStats = v8.getHeapStatistics();
  const usedHeap = heapStats.used_heap_size;
  const totalHeap = heapStats.heap_size_limit;
  const usagePercent = (usedHeap / totalHeap) * 100;
  
  if (usagePercent > 80) {
    console.error('[CRITICAL] Memory usage at', usagePercent.toFixed(2), '%');
    // Trigger alert to monitoring system
    sendAlert({
      severity: 'critical',
      message: `High memory usage: ${usagePercent}%`,
      metrics: heapStats
    });
  }
}

setInterval(checkMemoryUsage, 60 * 1000); // Check every minute

Lessons Learned

  1. Event listeners are not garbage collected until explicitly removed

    • Use removeListener() or off() in cleanup code
    • Consider using once() for single-use listeners
  2. Heap snapshots are invaluable for leak diagnosis

    • Take snapshots at regular intervals
    • Compare snapshots to identify growing objects
    • Focus on objects with high retention count
  3. Monitoring should include listener counts

    • Track EventEmitter.listenerCount() for critical emitters
    • Alert on unusual growth patterns
    • Implement max listener limits
  4. Automated testing can catch leaks early

    • Write tests that simulate high connection volume
    • Monitor memory usage during tests
    • Fail tests if memory grows unexpectedly
  5. Documentation is crucial

    • Document cleanup requirements for event handlers
    • Include lifecycle management in code reviews
    • Create runbooks for common leak patterns

Recommendations

Immediate Actions:

  • ✅ Deploy listener removal fix (COMPLETED)
  • ✅ Add memory monitoring alerts (COMPLETED)
  • ✅ Document WebSocket handler lifecycle (COMPLETED)

Short-term (1 week):

  • Audit all EventEmitter usage across codebase
  • Add automated listener count checks to CI
  • Create memory leak runbook for on-call team

Long-term (1 month):

  • Implement comprehensive memory testing in CI/CD
  • Build dashboard for real-time listener monitoring
  • Conduct team training on Node.js memory management

References


Report Compiled By: DevOps Team
Reviewed By: Engineering Lead
Status: Closed
Follow-up Date: 2026-02-20 (30-day review)