Protecting the Database from Traffic Spikes

·
system-design kafka redis cassandra scaling

The Scenario

IPL match ends. 10 million users post simultaneously = 10,000,000 writes/second.

What Happens at Each Layer

App Servers

Each server handles ~10k req/sec → need 1,000 servers at this spike. Cloud auto-scaling handles this, but takes 2-5 minutes to kick in. First 2-5 minutes → existing servers overwhelmed.

Cassandra — The Core Problem

Single Cassandra node: ~50,000 writes/second
10,000,000 / 50,000 = need 200 nodes

Normal load = 1,200 writes/sec → 5-10 nodes running

Suddenly 10M writes hit those 5-10 nodes:
→ Each node receives 1-2M writes/sec
→ Each node's limit is 50,000/sec
→ Nodes overwhelmed → writes fail → posts lost ❌

This is called impedance mismatch: traffic spikes in milliseconds, database capacity scales in minutes to hours.


The Solution: Never Let the Spike Hit the Database

Solution 1 — Kafka as Write Buffer ✅

Instead of:
App Server → Cassandra directly

Do this:
App Server → Kafka → Cassandra Writer Service → Cassandra

What this changes:

10M posts/sec spike hits App Servers
→ Write to Kafka → 5ms per write
→ Kafka absorbs all 10M messages instantly
→ Return success to user immediately ✅

Cassandra Writer reads from Kafka
→ Processes at Cassandra's comfortable speed (~500k/sec, 20 nodes)
→ Takes 20 seconds to clear the 10M backlog
→ All posts saved. No data lost. No overload. ✅

Solution 2 — Write to Redis First, Persist Async

User creates post
→ Write to Redis immediately → 1ms → return success ✅
→ Async worker persists to Cassandra in background

Why this works: fresh posts are read by followers from Redis anyway. Cassandra persistence can happen seconds later. User never knows the difference. This is the Write Behind caching strategy applied to handle spikes.

Solution 3 — Pre-scaling for Known Events

IPL match ending is predictable:

IPL final at 8 PM
At 7 PM:
→ Pre-scale Cassandra from 10 → 100 nodes
→ Pre-scale app servers from 20 → 500
→ Pre-warm Kafka with more partitions
→ Alert on-call team

At 8 PM spike hits → infrastructure already ready ✅

This is capacity planning. Twitter does it for New Year’s Eve. Hotstar for IPL. Flipkart for Big Billion Day.

Solution 4 — Rate Limiting

Per user: max 5 posts/minute (legitimate users never hit this)
System-wide: if incoming rate exceeds threshold → queue with delay
             → "please wait a moment" beats failing completely

The Updated Write Path at Scale

User creates post

App Server (auto-scaled, rate-limited)

Write to Redis → return success to user ✅

Publish to Kafka "post.created"

Two independent consumers:

Consumer 1 — Cassandra Writer
→ Batches writes → persists at controlled rate

Consumer 2 — Fan-out Service
→ Updates follower feeds → processes at own pace

How the Architecture Evolves

Stage 1 — Early product (1,200 writes/sec):
App Server → Cassandra directly
Simple. Works fine. No Kafka needed.

Stage 2 — Growing product (50,000 writes/sec):
App Server → Kafka → Cassandra Writer → Cassandra
Kafka added as buffer. Cassandra scaled to 20-30 nodes.

Stage 3 — Massive scale (10M writes/sec spike):
App Server → Redis (immediate write)
           → Kafka (async persistence + fan-out)
Redis write-behind → Cassandra eventually
Pre-scaling for known events + rate limiting
Cassandra geo-sharded

The Core Principle

The solution is never “find a faster database.” The solution is always “never let the spike hit the database directly.”

Kafka + Redis absorb the spike. The database sees smooth, controlled write rate. The database never knows there was a spike.

Protect your database from traffic spikes using buffers. The database should always see predictable load regardless of what’s happening at the traffic layer.