Common Redis Caching Mistakes (Q&A Format)
Q: Our Redis instance is 32GB but our database is only 8GB. Is this normal?
A: No, this is a red flag indicating poor cache hygiene.
We encountered this exact scenario. Investigation revealed cache entries from:
- Users inactive for 18+ months
- Deleted/banned accounts
- Test accounts from 2023
- Orphaned session data
Problem root cause: Missing TTL (Time To Live) on cache entries.
# ❌ Wrong - lives forever
redis.set(cache_key, json.dumps(data))
# ✅ Correct - expires after 1 hour
redis.setex(cache_key, 3600, json.dumps(data))
Impact: Set appropriate TTLs → 40% memory reduction immediately.
Q: How do I choose the right TTL value?
A: TTL should match data volatility, not be arbitrary.
Our TTL strategy by data type:
| Data Type | TTL | Rationale |
|---|---|---|
| User Profile | 24h | Rarely changes, safe to cache long |
| User Feed | 1h | Updates frequently, needs freshness |
| Trending Content | 5min | Real-time requirements |
| Static Content | 7d | Almost never changes |
| Session Data | 30min | Security consideration |
CACHE_TTLS = {
'user_profile': 86400, # 24 hours
'user_feed': 3600, # 1 hour
'trending_posts': 300, # 5 minutes
'static_content': 604800, # 1 week
}
def cache_set(key, value, data_type='default'):
ttl = CACHE_TTLS.get(data_type, 3600)
redis.setex(key, ttl, json.dumps(value))
Testing methodology: Start with conservative (short) TTLs, monitor cache hit rate, adjust incrementally.
Q: Why are my cache entries so large?
A: You’re likely caching entire object graphs instead of focused data.
Our mistake:
# ❌ Wrong - caches 50KB per post
post = db.query(Post).options(
joinedload(Post.author),
joinedload(Post.comments), # 200 comments!
joinedload(Post.tags),
joinedload(Post.reactions) # Thousands of reactions
).filter_by(id=post_id).first()
redis.set(f"post:{post_id}", json.dumps(post.to_dict()))
Better approach - granular caching:
# ✅ Correct - cache only what you need
def get_post_basic(post_id):
"""Cache just the post (2KB)"""
cache_key = f"post:{post_id}"
if cached := redis.get(cache_key):
return json.loads(cached)
post = db.query(Post).filter_by(id=post_id).first()
redis.setex(cache_key, 3600, json.dumps(post.to_dict()))
return post
def get_post_comments(post_id, limit=20):
"""Separate cache for comments (10KB)"""
comments_key = f"post:{post_id}:comments:{limit}"
if cached := redis.get(comments_key):
return json.loads(cached)
comments = fetch_comments(post_id, limit)
redis.setex(comments_key, 600, json.dumps(comments))
return comments
Result: 60% cache size reduction. Why? Most API calls need basic post data, not everything.
Q: What’s a cache stampede and how do I prevent it?
A: Cache stampede: When popular cache entry expires, hundreds of requests simultaneously hit the database to regenerate it.
Symptom: Periodic API timeouts coinciding with cache expiration.
Our incident:
- Popular post cache expired
- 500 concurrent requests hit database
- Database connection pool exhausted
- 10 minute outage
Solution - Cache Locking Pattern:
import uuid
import time
def get_with_lock(cache_key, fetch_func, ttl=3600):
# Try cache first
if cached := redis.get(cache_key):
return json.loads(cached)
# Attempt lock acquisition
lock_key = f"lock:{cache_key}"
lock_id = str(uuid.uuid4())
lock_acquired = redis.set(
lock_key,
lock_id,
nx=True, # Only set if doesn't exist
ex=10 # Lock expires in 10 seconds
)
if lock_acquired:
try:
# Winner fetches data
data = fetch_func()
redis.setex(cache_key, ttl, json.dumps(data))
return data
finally:
# Release lock with Lua script (atomic)
script = """
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
end
"""
redis.eval(script, 1, lock_key, lock_id)
else:
# Losers wait and retry
time.sleep(0.1)
return get_with_lock(cache_key, fetch_func, ttl)
Key concepts:
- Only first request acquires lock and fetches data
- Other requests wait briefly then retry (data will be cached)
- Lock timeout prevents deadlocks
- Atomic lock release prevents race conditions
Q: How do I know if my caching strategy is working?
A: Measure cache hit rate.
Our discovery: Cache hit rate was 23%. Terrible. We were wasting resources.
Instrumentation code:
def get_cached(key, fetch_func, ttl=3600):
cached = redis.get(key)
key_prefix = key.split(":")[0]
if cached:
statsd.increment('cache.hit', tags=[f'prefix:{key_prefix}'])
return json.loads(cached)
statsd.increment('cache.miss', tags=[f'prefix:{key_prefix}'])
data = fetch_func()
redis.setex(key, ttl, json.dumps(data))
return data
Monitoring dashboard metrics:
- Overall hit rate:
hits / (hits + misses) - Per-key-type hit rate
- Memory usage by key prefix
- Eviction rate
Action items from monitoring:
- Identified cache keys with <10% hit rate → Removed them
- Found poorly tuned TTLs → Adjusted based on actual access patterns
- Discovered over-cached data → Implemented granular caching
Current metrics: 87% hit rate (target: >80%)
Q: Should I JSON-encode everything I cache?
A: No. Use Redis native data types when appropriate.
Inefficient:
# ❌ Wrong - unnecessary JSON overhead
redis.set(f"counter:{user_id}", json.dumps(42))
redis.set(f"online_users", json.dumps([1, 2, 3, 4]))
Efficient:
# ✅ Correct - use native Redis types
# Counters
redis.incr(f"counter:{user_id}") # Atomic increment
value = int(redis.get(f"counter:{user_id}"))
# Sets
redis.sadd(f"online_users", 1, 2, 3, 4) # No duplicates
users = redis.smembers(f"online_users")
# Sorted Sets (leaderboards)
redis.zadd("leaderboard", {user_id: score})
top10 = redis.zrevrange("leaderboard", 0, 9, withscores=True)
# Hashes (objects)
redis.hset(f"user:{user_id}", mapping={
"name": "John",
"email": "[email protected]"
})
user_data = redis.hgetall(f"user:{user_id}")
Advantages:
- Less memory (no JSON encoding overhead)
- Faster operations
- Atomic operations built-in
- Type-specific commands (e.g., ZINCRBY, SINTER)
Q: What should I cache - the data or the computation?
A: Cache the expensive computation result, not intermediate data.
Inefficient approach:
# ❌ Caching queries, still doing computation
def get_reputation_score(user_id):
# Cache each query
posts = cache_get(f"posts:{user_id}", lambda: count_posts(user_id))
upvotes = cache_get(f"upvotes:{user_id}", lambda: count_upvotes(user_id))
comments = cache_get(f"comments:{user_id}", lambda: count_comments(user_id))
# Still computing this every time!
return posts * 5 + upvotes * 2 + comments * 3
Efficient approach:
# ✅ Cache the final computation
def get_reputation_score(user_id):
cache_key = f"reputation:{user_id}"
if score := redis.get(cache_key):
return int(score)
# Calculate once
posts = count_posts(user_id)
upvotes = count_upvotes(user_id)
comments = count_comments(user_id)
score = posts * 5 + upvotes * 2 + comments * 3
# Cache the result
redis.setex(cache_key, 3600, score)
return score
Principle: Cache what’s expensive to compute, not what’s expensive to fetch.
Q: Our team keeps adding cache keys. How do we maintain cache hygiene?
A: Implement automated cache analysis.
Our solution - Weekly audit job:
def analyze_cache_patterns():
"""Analyze cache usage and identify waste"""
cursor = 0
patterns = {}
while True:
cursor, keys = redis.scan(cursor, count=1000)
for key in keys:
prefix = key.split(b":")[0].decode()
ttl = redis.ttl(key)
size = len(redis.dump(key))
if prefix not in patterns:
patterns[prefix] = {
'count': 0,
'total_size': 0,
'no_ttl': 0
}
patterns[prefix]['count'] += 1
patterns[prefix]['total_size'] += size
if ttl == -1: # No TTL set
patterns[prefix]['no_ttl'] += 1
if cursor == 0:
break
# Generate report
report = []
for prefix, stats in sorted(patterns.items(),
key=lambda x: x[1]['total_size'],
reverse=True):
report.append({
'prefix': prefix,
'key_count': stats['count'],
'total_mb': stats['total_size'] / 1024 / 1024,
'keys_without_ttl': stats['no_ttl']
})
return report
Weekly actions:
- Review top memory consumers
- Identify keys without TTL
- Check for orphaned test keys
- Analyze cache hit rates by prefix
- Remove unused cache patterns
Last week’s findings: Removed 2GB of abandoned test data.
Summary Checklist
✅ Always set TTLs - Match TTL to data volatility
✅ Cache granularly - Small, focused cache entries
✅ Prevent stampedes - Use lock patterns for expensive operations
✅ Monitor continuously - Track hit rate, memory, evictions
✅ Use native types - Avoid JSON when Redis types suffice
✅ Cache computation - Not intermediate data
✅ Audit regularly - Weekly cache hygiene review
Results After 3 Months
| Metric | Before | After | Improvement |
|---|---|---|---|
| Redis Memory | 32GB | 9GB | -72% |
| Monthly Cost | $12,000 | $3,000 | -75% |
| API Response Time | 450ms | 120ms | -73% |
| Cache Hit Rate | 23% | 87% | +278% |
Total engineering time invested: ~80 hours
Monthly savings: $9,000
ROI: Positive after 1 month
Need help with Redis optimization? Common issues include: improper TTL strategy, cache stampedes, poor key design, and inadequate monitoring. Address these systematically for best results.