Performance

Node.js Worker Threads for CPU Tasks

When to use CPU-bound tasks (crypto, compression, image/video processing) that would block event loop. Avoid for short, tiny tasks—overhead may outweigh benefit. Patterns Use a pool (e.g., piscina/workerpool) with bounded size; queue tasks. Pass data via Transferable when large buffers; avoid heavy serialization. Propagate cancellation/timeouts; surface errors to main thread. Observability Track queue length, task duration, worker utilization, crashes. Monitor event loop lag to confirm offload benefits. Safety Validate inputs in main thread; avoid untrusted code in workers. Cleanly shut down pool on SIGTERM; drain and close workers. Checklist CPU tasks isolated to workers; pool sized to cores. Transferables used for big buffers; timeouts set. Metrics for queue/utilization/errors in place.

Scaling Node.js: Fastify vs Express

Performance notes Fastify: schema-driven, AJV validation, low-overhead routing; typically better RPS and lower p99s. Express: mature ecosystem; middleware can add overhead; great for quick prototypes. Bench considerations Compare with same validation/parsing; disable unnecessary middleware in Express. Test under keep-alive and HTTP/1.1 & HTTP/2 where applicable. Measure event loop lag and heap; not just RPS. Migration tips Start new services with Fastify; for existing Express apps, migrate edge routes first. Replace middleware with hooks/plugins; map Express middlewares to Fastify equivalents. Validate payloads with JSON schema for speed and safety. Operational tips Use pino (Fastify default) for structured logs; avoid console log overhead. Keep plugin count minimal; watch async hooks cost. Load test with realistic payload sizes; profile hotspots before/after migration. Takeaway: Fastify wins on raw throughput and structured DX; Express still fine for smaller apps, but performance-critical services benefit from Fastify’s lower overhead.

Go High-Performance Concurrency Playbook

Principles Keep goroutine count bounded; size pools to CPU/core and downstream QPS. Apply backpressure: bounded channels + select with default to shed load early. Context everywhere: cancel on timeouts/parent cancellation; close resources. Prefer immutability; minimize shared state. Use channels for coordination. Patterns Worker pool with buffered jobs; fan-out/fan-in via contexts. Rate limit with time.Ticker or golang.org/x/time/rate. Semaphore via buffered channel for limited resources (DB, disk, external API). Sync cheatsheet sync.Mutex for critical sections; avoid long hold times. sync.WaitGroup for bounded concurrent tasks; errgroup for cancellation on first error. sync.Map only for high-concurrency, write-light cases; prefer map+mutex otherwise. Instrument & guard pprof: net/http/pprof + CPU/mem profiles in staging under load. Trace blocking: GODEBUG=schedtrace=1000,scavtrace=1 when diagnosing. Metrics: goroutines, GC pause, allocations, queue depth, worker utilization. Checklist Context per request; timeouts set at ingress. Bounded goroutines; pools sized and observable. Backpressure on queues; drop/timeout strategy defined. pprof/metrics enabled in non-prod and behind auth in prod. Load tests for saturation behavior and graceful degradation.

PostgreSQL performance tuning illustration

PostgreSQL Performance Tuning: The Power of work_mem

PostgreSQL’s work_mem parameter is one of the most impactful yet misunderstood configuration settings for database performance. This post explores how adjusting work_mem can dramatically improve query execution times, especially for operations involving sorting, hashing, and joins. Understanding work_mem work_mem specifies the amount of memory PostgreSQL can use for internal sort operations and hash tables before writing to temporary disk files. The default value is 4MB, which is conservative to ensure PostgreSQL runs on smaller machines. ...

Tuning Kafka Consumers (Java)

Core settings max.poll.interval.ms sized to processing time; max.poll.records to batch size. fetch.min.bytes/fetch.max.wait.ms to trade latency vs throughput. enable.auto.commit=false; commit sync/async after processing batch. Concurrency Prefer multiple consumer instances over massive max.poll.records. For CPU-bound steps, hand off to bounded executor; avoid blocking poll thread. Ordering & retries Keep partition affinity when ordering matters; use DLT for poison messages. Backoff with jitter on retries; limit attempts per message. Observability Metrics: lag per partition, commit latency, rebalances, processing time, error rates. Log offsets and partition for errors; trace batch sizes. Checklist Poll loop never blocks; work delegated to bounded pool. Commits after successful processing; DLT in place. Lag and rebalance metrics monitored.

Database Optimization Techniques: Performance Tuning Guide

Database performance is critical for application scalability. Here are proven optimization techniques. 1. Indexing Strategy When to Index -- Index frequently queried columns CREATE INDEX idx_user_email ON users(email); -- Index foreign keys CREATE INDEX idx_post_user_id ON posts(user_id); -- Composite indexes for multi-column queries CREATE INDEX idx_user_status_role ON users(status, role); When NOT to Index Columns with low cardinality (few unique values) Frequently updated columns Small tables (< 1000 rows) 2. Query Optimization Avoid SELECT * -- Bad SELECT * FROM users WHERE id = 123; -- Good SELECT id, name, email FROM users WHERE id = 123; Use LIMIT -- Always limit large result sets SELECT * FROM posts ORDER BY created_at DESC LIMIT 20; Avoid N+1 Queries // Bad: N+1 queries users.forEach(user => { const posts = db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]); }); // Good: Single query with JOIN const usersWithPosts = db.query(` SELECT u.*, p.* FROM users u LEFT JOIN posts p ON u.id = p.user_id `); 3. Connection Pooling // Configure connection pool const pool = mysql.createPool({ connectionLimit: 10, host: 'localhost', user: 'user', password: 'password', database: 'mydb', waitForConnections: true, queueLimit: 0 }); 4. Caching Application-Level Caching // Cache frequently accessed data const cache = new Map(); async function getUser(id) { if (cache.has(id)) { return cache.get(id); } const user = await db.query('SELECT * FROM users WHERE id = ?', [id]); cache.set(id, user); return user; } Query Result Caching -- Use query cache (MySQL) SET GLOBAL query_cache_size = 67108864; SET GLOBAL query_cache_type = 1; 5. Database Schema Optimization Normalize Properly -- Avoid over-normalization -- Balance between normalization and performance Use Appropriate Data Types -- Use smallest appropriate type TINYINT instead of INT for small numbers VARCHAR(255) instead of TEXT when possible DATE instead of DATETIME when time not needed 6. Partitioning -- Partition large tables by date CREATE TABLE logs ( id INT, created_at DATE, data TEXT ) PARTITION BY RANGE (YEAR(created_at)) ( PARTITION p2023 VALUES LESS THAN (2024), PARTITION p2024 VALUES LESS THAN (2025), PARTITION p2025 VALUES LESS THAN (2026) ); 7. Query Analysis EXPLAIN Plan EXPLAIN SELECT * FROM users WHERE email = '[email protected]'; Slow Query Log -- Enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; 8. Batch Operations // Bad: Multiple individual inserts users.forEach(user => { db.query('INSERT INTO users (name, email) VALUES (?, ?)', [user.name, user.email]); }); // Good: Batch insert const values = users.map(u => [u.name, u.email]); db.query('INSERT INTO users (name, email) VALUES ?', [values]); 9. Database Maintenance Regular Vacuuming (PostgreSQL) VACUUM ANALYZE; Optimize Tables (MySQL) OPTIMIZE TABLE users; 10. Monitoring Monitor query performance Track slow queries Monitor connection pool usage Watch for table locks Monitor disk I/O Best Practices Index strategically Optimize queries Use connection pooling Implement caching Normalize appropriately Use appropriate data types Partition large tables Analyze query performance Batch operations Regular maintenance Conclusion Database optimization requires: ...

RabbitMQ High Availability & Tuning

Queue types Prefer quorum queues for HA; classic for transient/high-throughput if loss acceptable. Set durability/persistence appropriately; avoid auto-delete for critical flows. Flow control Enable publisher confirms; set mandatory flag to catch unroutable messages. Use basic.qos to bound unacked messages; prefetch tuned per consumer. Watch memory/flow events; avoid oversized messages—use blob storage for big payloads. Topology & ops Mirror/quorum across AZs; avoid single-node SPOF. Use consistent hash/partitioning for hot-key spreading. Metrics: publish/consume rates, unacked count, queue depth, confirm latency, blocked connections. Checklist Queue type chosen (quorum vs classic) per workload. Publisher confirms + unroutable handling. Prefetch/qos tuned; consumers idempotent. Monitoring/alerts on depth, unacked, flow control.

Java GC Tuning: G1 and ZGC in Practice

Choosing G1: balanced latency/throughput for heaps 4–64GB; predictable pauses. ZGC: sub-10ms pauses on large heaps; great for latency-sensitive APIs; slightly higher CPU. Baseline flags G1: -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch ZGC: -XX:+UseZGC -XX:+ZGenerational -XX:+AlwaysPreTouch Set -Xms = -Xmx for stable footprint; size heap from prod RSS data. Metrics to watch Pause p95/p99, GC CPU %, allocation rate, remembered set size (G1), heap occupancy. STW reasons: promotion failure, humongous allocations (G1), metaspace growth. Common fixes Reduce humongous allocations: avoid giant byte[]; use chunked buffers. Lower pause targets only after measuring; avoid over-constraining MaxGCPauseMillis. Cap thread counts: -XX:ParallelGCThreads, -XX:ConcGCThreads if CPU saturated. For ZGC, ensure kernel pages hugepage-friendly; watch NUMA pinning. Checklist Heap sized from live data; -Xms = -Xmx. GC logs on (JDK 17+): -Xlog:gc*:tags,level,time,uptime:file=gc.log:utctime,filesize=20M,files=10 Dashboards for pause/CPU/allocation. Load test changes before prod; compare pause histograms release to release.

Go Database Pooling Patterns (sqlx/pgx)

Sizing Pool size ≈ CPU cores * 2–4 per service instance; avoid per-request opens. For PgBouncer tx-mode: disable session features; avoid session-prepared statements. Timeouts & limits Set ConnMaxLifetime, ConnMaxIdleTime, MaxOpenConns, MaxIdleConns. Add statement timeouts; enforce context deadlines on queries. Instrumentation Track pool acquire latency, in-use/idle, wait count, timeouts. Log slow queries; sample EXPLAIN ANALYZE in staging for heavy ones. Hygiene Use prepared statements judiciously; reuse sqlx.Named/pgx prepared for hot paths. Prefer keyset pagination; cap result sizes; parameterize everything. Checklist Pool sized and monitored. Query timeouts set; slow logs reviewed. No per-request connections; connections closed via context cancellation.

Node.js Event Loop Internals (2024)

Phases refresher timers → pending → idle/prepare → poll → check → close callbacks. Microtasks (Promises/queueMicrotask) run after each phase; process.nextTick runs before microtasks. Pitfalls Long JS on main thread blocks poll → delays I/O; move CPU work to worker threads. nextTick storms can starve I/O; prefer setImmediate when deferring. Unhandled rejections crash (from Node 15+ default); handle globally in prod. Debugging NODE_DEBUG=async_hooks or --trace-events-enabled --trace-event-categories node.async_hooks. Measure event loop lag: perf_hooks.monitorEventLoopDelay(). Profile CPU with node --inspect + Chrome DevTools; use flamegraphs for hotspots. Practices Limit synchronous JSON/crypto/zlib; offload to worker threads or native modules. Keep microtask chains short; avoid deep promise recursion. Use AbortController for cancellable I/O; always clear timers. Checklist Monitor event loop lag & heap usage. Worker pool sized for CPU tasks; main loop kept light. Errors and rejections centrally handled; graceful shutdown in place.