Common Redis Caching Mistakes (Q&A Format)

Q: Our Redis instance is 32GB but our database is only 8GB. Is this normal?

A: No, this is a red flag indicating poor cache hygiene.

We encountered this exact scenario. Investigation revealed cache entries from:

Users inactive for 18+ months
Deleted/banned accounts
Test accounts from 2023
Orphaned session data

Problem root cause: Missing TTL (Time To Live) on cache entries.

# ❌ Wrong - lives forever
redis.set(cache_key, json.dumps(data))

# ✅ Correct - expires after 1 hour
redis.setex(cache_key, 3600, json.dumps(data))

Impact: Set appropriate TTLs → 40% memory reduction immediately.

Q: How do I choose the right TTL value?

A: TTL should match data volatility, not be arbitrary.

Our TTL strategy by data type:

Data Type	TTL	Rationale
User Profile	24h	Rarely changes, safe to cache long
User Feed	1h	Updates frequently, needs freshness
Trending Content	5min	Real-time requirements
Static Content	7d	Almost never changes
Session Data	30min	Security consideration

CACHE_TTLS = {
    'user_profile': 86400,      # 24 hours
    'user_feed': 3600,          # 1 hour
    'trending_posts': 300,      # 5 minutes
    'static_content': 604800,   # 1 week
}

def cache_set(key, value, data_type='default'):
    ttl = CACHE_TTLS.get(data_type, 3600)
    redis.setex(key, ttl, json.dumps(value))

Testing methodology: Start with conservative (short) TTLs, monitor cache hit rate, adjust incrementally.

Q: Why are my cache entries so large?

A: You’re likely caching entire object graphs instead of focused data.

Our mistake:

# ❌ Wrong - caches 50KB per post
post = db.query(Post).options(
    joinedload(Post.author),
    joinedload(Post.comments),      # 200 comments!
    joinedload(Post.tags),
    joinedload(Post.reactions)       # Thousands of reactions
).filter_by(id=post_id).first()

redis.set(f"post:{post_id}", json.dumps(post.to_dict()))

Better approach - granular caching:

# ✅ Correct - cache only what you need
def get_post_basic(post_id):
    """Cache just the post (2KB)"""
    cache_key = f"post:{post_id}"
    if cached := redis.get(cache_key):
        return json.loads(cached)
    
    post = db.query(Post).filter_by(id=post_id).first()
    redis.setex(cache_key, 3600, json.dumps(post.to_dict()))
    return post

def get_post_comments(post_id, limit=20):
    """Separate cache for comments (10KB)"""
    comments_key = f"post:{post_id}:comments:{limit}"
    if cached := redis.get(comments_key):
        return json.loads(cached)
    
    comments = fetch_comments(post_id, limit)
    redis.setex(comments_key, 600, json.dumps(comments))
    return comments

Result: 60% cache size reduction. Why? Most API calls need basic post data, not everything.

Q: What’s a cache stampede and how do I prevent it?

A: Cache stampede: When popular cache entry expires, hundreds of requests simultaneously hit the database to regenerate it.

Symptom: Periodic API timeouts coinciding with cache expiration.

Our incident:

Popular post cache expired
500 concurrent requests hit database
Database connection pool exhausted
10 minute outage

Solution - Cache Locking Pattern:

import uuid
import time

def get_with_lock(cache_key, fetch_func, ttl=3600):
    # Try cache first
    if cached := redis.get(cache_key):
        return json.loads(cached)
    
    # Attempt lock acquisition
    lock_key = f"lock:{cache_key}"
    lock_id = str(uuid.uuid4())
    
    lock_acquired = redis.set(
        lock_key,
        lock_id,
        nx=True,    # Only set if doesn't exist
        ex=10       # Lock expires in 10 seconds
    )
    
    if lock_acquired:
        try:
            # Winner fetches data
            data = fetch_func()
            redis.setex(cache_key, ttl, json.dumps(data))
            return data
        finally:
            # Release lock with Lua script (atomic)
            script = """
            if redis.call("get", KEYS[1]) == ARGV[1] then
                return redis.call("del", KEYS[1])
            end
            """
            redis.eval(script, 1, lock_key, lock_id)
    else:
        # Losers wait and retry
        time.sleep(0.1)
        return get_with_lock(cache_key, fetch_func, ttl)

Key concepts:

Only first request acquires lock and fetches data
Other requests wait briefly then retry (data will be cached)
Lock timeout prevents deadlocks
Atomic lock release prevents race conditions

Q: How do I know if my caching strategy is working?

A: Measure cache hit rate.

Our discovery: Cache hit rate was 23%. Terrible. We were wasting resources.

Instrumentation code:

def get_cached(key, fetch_func, ttl=3600):
    cached = redis.get(key)
    key_prefix = key.split(":")[0]
    
    if cached:
        statsd.increment('cache.hit', tags=[f'prefix:{key_prefix}'])
        return json.loads(cached)
    
    statsd.increment('cache.miss', tags=[f'prefix:{key_prefix}'])
    
    data = fetch_func()
    redis.setex(key, ttl, json.dumps(data))
    return data

Monitoring dashboard metrics:

Overall hit rate: hits / (hits + misses)
Per-key-type hit rate
Memory usage by key prefix
Eviction rate

Action items from monitoring:

Identified cache keys with <10% hit rate → Removed them
Found poorly tuned TTLs → Adjusted based on actual access patterns
Discovered over-cached data → Implemented granular caching

Current metrics: 87% hit rate (target: >80%)

Q: Should I JSON-encode everything I cache?

A: No. Use Redis native data types when appropriate.

Inefficient:

# ❌ Wrong - unnecessary JSON overhead
redis.set(f"counter:{user_id}", json.dumps(42))
redis.set(f"online_users", json.dumps([1, 2, 3, 4]))

Efficient:

# ✅ Correct - use native Redis types

# Counters
redis.incr(f"counter:{user_id}")                # Atomic increment
value = int(redis.get(f"counter:{user_id}"))

# Sets
redis.sadd(f"online_users", 1, 2, 3, 4)        # No duplicates
users = redis.smembers(f"online_users")

# Sorted Sets (leaderboards)
redis.zadd("leaderboard", {user_id: score})
top10 = redis.zrevrange("leaderboard", 0, 9, withscores=True)

# Hashes (objects)
redis.hset(f"user:{user_id}", mapping={
    "name": "John",
    "email": "[email protected]"
})
user_data = redis.hgetall(f"user:{user_id}")

Advantages:

Less memory (no JSON encoding overhead)
Faster operations
Atomic operations built-in
Type-specific commands (e.g., ZINCRBY, SINTER)

Q: What should I cache - the data or the computation?

A: Cache the expensive computation result, not intermediate data.

Inefficient approach:

# ❌ Caching queries, still doing computation
def get_reputation_score(user_id):
    # Cache each query
    posts = cache_get(f"posts:{user_id}", lambda: count_posts(user_id))
    upvotes = cache_get(f"upvotes:{user_id}", lambda: count_upvotes(user_id))
    comments = cache_get(f"comments:{user_id}", lambda: count_comments(user_id))
    
    # Still computing this every time!
    return posts * 5 + upvotes * 2 + comments * 3

Efficient approach:

# ✅ Cache the final computation
def get_reputation_score(user_id):
    cache_key = f"reputation:{user_id}"
    
    if score := redis.get(cache_key):
        return int(score)
    
    # Calculate once
    posts = count_posts(user_id)
    upvotes = count_upvotes(user_id)
    comments = count_comments(user_id)
    
    score = posts * 5 + upvotes * 2 + comments * 3
    
    # Cache the result
    redis.setex(cache_key, 3600, score)
    return score

Principle: Cache what’s expensive to compute, not what’s expensive to fetch.

Q: Our team keeps adding cache keys. How do we maintain cache hygiene?

A: Implement automated cache analysis.

Our solution - Weekly audit job:

def analyze_cache_patterns():
    """Analyze cache usage and identify waste"""
    
    cursor = 0
    patterns = {}
    
    while True:
        cursor, keys = redis.scan(cursor, count=1000)
        
        for key in keys:
            prefix = key.split(b":")[0].decode()
            ttl = redis.ttl(key)
            size = len(redis.dump(key))
            
            if prefix not in patterns:
                patterns[prefix] = {
                    'count': 0,
                    'total_size': 0,
                    'no_ttl': 0
                }
            
            patterns[prefix]['count'] += 1
            patterns[prefix]['total_size'] += size
            if ttl == -1:  # No TTL set
                patterns[prefix]['no_ttl'] += 1
        
        if cursor == 0:
            break
    
    # Generate report
    report = []
    for prefix, stats in sorted(patterns.items(), 
                                 key=lambda x: x[1]['total_size'], 
                                 reverse=True):
        report.append({
            'prefix': prefix,
            'key_count': stats['count'],
            'total_mb': stats['total_size'] / 1024 / 1024,
            'keys_without_ttl': stats['no_ttl']
        })
    
    return report

Weekly actions:

Review top memory consumers
Identify keys without TTL
Check for orphaned test keys
Analyze cache hit rates by prefix
Remove unused cache patterns

Last week’s findings: Removed 2GB of abandoned test data.

Summary Checklist

✅ Always set TTLs - Match TTL to data volatility
✅ Cache granularly - Small, focused cache entries
✅ Prevent stampedes - Use lock patterns for expensive operations
✅ Monitor continuously - Track hit rate, memory, evictions
✅ Use native types - Avoid JSON when Redis types suffice
✅ Cache computation - Not intermediate data
✅ Audit regularly - Weekly cache hygiene review

Results After 3 Months

Metric	Before	After	Improvement
Redis Memory	32GB	9GB	-72%
Monthly Cost	$12,000	$3,000	-75%
API Response Time	450ms	120ms	-73%
Cache Hit Rate	23%	87%	+278%

Total engineering time invested: ~80 hours
Monthly savings: $9,000
ROI: Positive after 1 month

Need help with Redis optimization? Common issues include: improper TTL strategy, cache stampedes, poor key design, and inadequate monitoring. Address these systematically for best results.

Common Redis Caching Mistakes (Q&A Format)#

Q: Our Redis instance is 32GB but our database is only 8GB. Is this normal?#

Q: How do I choose the right TTL value?#

Q: Why are my cache entries so large?#

Q: What’s a cache stampede and how do I prevent it?#

Q: How do I know if my caching strategy is working?#

Q: Should I JSON-encode everything I cache?#

Q: What should I cache - the data or the computation?#

Q: Our team keeps adding cache keys. How do we maintain cache hygiene?#

Summary Checklist#

Results After 3 Months#