Java Deep Dive: The Secrets of Profiling
27 slides
Press → or Space to advance
Press F to toggle fullscreen
Scaling Node.js in Production
Practical strategies for millions of requests per day.
What we'll cover
- •Process management
- •Performance optimization
- •Observability at scale
- •Reliability patterns
The Event Loop
Node.js is single-threaded.
But that's a feature, not a bug.
Understanding the Event Loop
┌───────────────────────────┐
┌─►│ timers │
│ └─────────────┬─────────────┘
│ ┌─────────────┴─────────────┐
│ │ pending callbacks │
│ └─────────────┬─────────────┘
│ ┌─────────────┴─────────────┐
│ │ idle, prepare │
│ └─────────────┬─────────────┘
│ ┌─────────────┴─────────────┐
│ │ poll │
│ └─────────────┬─────────────┘
│ ┌─────────────┴─────────────┐
│ │ check │
│ └─────────────┬─────────────┘
│ ┌─────────────┴─────────────┐
└──┤ close callbacks │
└───────────────────────────┘
The Golden Rule
Never block the event loop.
Common Blockers
- •Synchronous file I/O
- •CPU-intensive computation
- •Large JSON parsing
- •Complex regex
Cluster Mode
Use all your CPU cores.
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;
if (cluster.isPrimary) {
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
}
Process Managers
Options for production:
- •PM2
- •systemd
- •Docker + orchestrator
- •Kubernetes
Graceful Shutdown
Don't drop in-flight requests.
process.on('SIGTERM', async () => {
server.close();
await drainConnections();
process.exit(0);
});
Memory Management
V8 has limits.
Default: ~1.5GB on 64-bit systems.
Detecting Memory Leaks
Signs to watch for:
- •RSS growing over time
- •Increasing GC frequency
- •Longer GC pauses
Heap Snapshots
Take them in production.
Compare before and after.
Find retained objects.
Connection Pooling
Don't create new connections per request.
Reuse them.
Database Pool Sizing
const pool = new Pool({
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
Too few: queuing Too many: database overload
Caching Strategies
- •In-memory (fastest, limited)
- •Redis (distributed, durable)
- •CDN (edge, static content)
Cache Invalidation
The two hard problems:
- •Naming things
- •Cache invalidation
- •Off-by-one errors
Structured Logging
JSON, not strings.
logger.info({
event: 'request_completed',
duration_ms: 45,
status: 200,
path: '/api/users',
});
Metrics That Matter
- •Request rate (throughput)
- •Error rate (reliability)
- •Latency percentiles (performance)
- •Saturation (capacity)
The RED Method
Rate - requests per second
Errors - failed requests
Duration - latency distribution
Distributed Tracing
Follow requests across services.
OpenTelemetry is the standard.
Circuit Breakers
Fail fast when downstream is broken.
const breaker = new CircuitBreaker(apiCall, {
timeout: 3000,
errorThreshold: 50,
resetTimeout: 30000,
});
Retry with Backoff
Exponential backoff prevents thundering herd.
const delay = Math.min(
baseDelay * Math.pow(2, attempt),
maxDelay
);
Health Checks
Liveness: Is the process alive?
Readiness: Can it serve traffic?
Load Shedding
When overloaded, reject gracefully.
Better to serve some requests well than all requests poorly.
Key Takeaways
- •Understand and respect the event loop
- •Horizontal scaling is your friend
- •Observability is not optional
- •Plan for failure
Resources
- •github.com/gruzewski/nodejs-scaling-examples
- •Node.js documentation
- •OpenTelemetry docs
Thank You
Questions?
@gruzewski