Executive Summary

This document analyzes a three-month production deployment of PostgreSQL JSONB columns, documenting performance issues encountered, indexing strategies implemented, and architectural patterns that proved effective. The findings are relevant for teams considering JSONB for schema flexibility in high-traffic applications.

Key Metrics:

  • Initial table size: 2M records
  • Average JSONB column size: 50KB
  • Write throughput: 10K updates/minute
  • Query degradation: 200ms → 14s (70x regression)

Background and Context

In Q4 2025, our engineering team migrated user preference storage from normalized columns to JSONB format. Primary motivation was reducing schema migration overhead as product requirements evolved.

Initial Schema Design

CREATE TABLE users (
    user_id BIGINT PRIMARY KEY,
    email VARCHAR(255),
    created_at TIMESTAMP,
    settings JSONB
);

The settings column consolidated 23 previously discrete preference fields.

Performance Analysis

Phase 1: Initial Deployment (Weeks 1-2)

Initial performance metrics were acceptable:

  • SELECT queries: <50ms p95
  • INSERT operations: <30ms p95
  • Table scan performance: Adequate for 2M records

Phase 2: Query Degradation (Week 3)

Performance degradation occurred when analytics queries began executing against JSONB fields:

SELECT
  settings->>'signup_source',
  COUNT(*)
FROM users
WHERE settings->>'created_year' = '2025'
GROUP BY settings->>'signup_source';

Observed Metrics:

  • Query execution time: 14,000ms
  • Sequential scan initiated
  • CPU utilization: 85%+

Root Cause: JSONB columns lack automatic indexing. Queries using -> and ->> operators performed full table scans.

Indexing Strategies Implemented

Strategy 1: GIN Indexes

General Inverted (GIN) indexes provide containment query support:

CREATE INDEX idx_user_settings_gin 
ON users USING GIN (settings);

Performance Characteristics:

  • Optimizes: Containment queries (@>, @?, ?)
  • Does not optimize: Value extraction (->>, #>)
  • Index size overhead: ~15% of column data

Strategy 2: Expression Indexes

Path-specific indexes for frequently queried fields:

CREATE INDEX idx_newsletter_sub 
ON users ((settings->>'newsletter_subscribed'));

CREATE INDEX idx_region 
ON users ((settings->>'region'));

Results:

  • Query time reduction: 14s → 180ms (98% improvement)
  • Total indexes created: 8
  • Storage overhead: 30% increase

Strategy 3: Partial Indexes

For queries filtering on specific JSON values:

CREATE INDEX idx_premium_users 
ON users ((settings->>'premium'))
WHERE (settings->>'premium') = 'true';

Write Amplification Issue

Problem Description

JSONB updates rewrite the entire column value, not individual fields:

UPDATE users
SET settings = jsonb_set(settings, '{last_login}', '"2026-01-08"')
WHERE user_id = 12345;

Impact Analysis:

  • Average JSONB size: 50KB
  • Update frequency: 10K/minute
  • WAL generation: 500MB/minute
  • Disk I/O saturation observed

Solution Architecture

Implemented frequency-based column segregation:

ALTER TABLE users 
    ADD COLUMN session_data JSONB,
    ADD COLUMN profile_settings JSONB;

-- Hot data (frequent updates): session_data (~2KB)
-- Cold data (rare updates): profile_settings (~48KB)

Performance Improvement:

  • Write amplification: -70%
  • WAL generation: 500MB/min → 150MB/min
  • Disk I/O utilization: 90% → 35%

Use Case Analysis

JSONB is appropriate for:

  1. Variable User Preferences

    • Schema varies significantly per user
    • Read-heavy workload
    • Infrequent updates
  2. Audit Logging

    • Variable metadata per event
    • Write-once, read-rarely pattern
    • No complex querying requirements
  3. API Response Caching

    • External API responses with variable structure
    • TTL-based data lifecycle
    • Simple key-value retrieval
  4. Prototype Development

    • Schema in active development
    • Temporary flexibility requirement
    • Migration to normalized schema planned

JSONB is inappropriate for:

  1. High-frequency updates to large documents
  2. Complex aggregations across JSONB fields
  3. Frequently queried fields that could be normalized
  4. Critical business data requiring strict schema validation

Final Architecture

Our production system evolved to a hybrid model:

CREATE TABLE users (
    -- Core fields (normalized)
    user_id BIGINT PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP NOT NULL,
    
    -- Volatile state (small JSONB, frequent updates)
    session_data JSONB,
    
    -- User preferences (indexed JSONB, rare updates)
    profile_settings JSONB
);

-- Separate table for historical data
CREATE TABLE user_settings_history (
    history_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT REFERENCES users(user_id),
    settings_snapshot JSONB,
    created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

Current Performance Metrics

  • Query latency: 45ms p95 (restored to baseline)
  • Write throughput: 10K/min (maintained)
  • WAL generation: 150MB/min (70% reduction)
  • Storage efficiency: Acceptable with indexing overhead

Lessons Learned

  1. Index Strategy is Critical: JSONB requires explicit indexing strategy from day one
  2. Write Amplification Real: Update frequency must inform schema design
  3. Hybrid Approach Optimal: Combine normalized columns with JSONB selectively
  4. Monitor from Start: Implement comprehensive monitoring before production deployment
  5. Documentation Essential: PostgreSQL JSONB documentation is comprehensive; invest time upfront

Technical Debt Assessment

Migration cost comparison:

  • Original schema migration (5 columns): ~30 minutes with pt-online-schema-change
  • JSONB refactoring effort: ~6 weeks of engineering time
  • Production incident response: ~40 hours

Recommendation: Evaluate schema flexibility requirements carefully. Traditional normalized schemas remain appropriate for most use cases.

References


Document Version: 1.0
Last Updated: 2026-01-08
Author: Database Engineering Team
Review Status: Production Validated