Scaling the Feed System Beyond 100k Writes/Second

The Question

The feed system was designed for 1,200 post writes/second. What happens if writes jump to 100,000/second? Which components break and how does the architecture change?

Stress Test Each Component

The write path is:

User creates post
      ↓
App Servers → Cassandra → Kafka → Fan-out Workers → Redis (feed cache)

You must stress-test every link, not just the database.

App Servers

At 100k posts/sec, each post takes ~10-20ms to process. One server handles ~5,000-10,000 req/sec. At 100k/sec: need 10-20 servers.

Fix: horizontal scaling + cloud auto-scaling. Load balancer already handles this. No architectural change.

Cassandra

Single node: 20k-50k writes/sec. A 4-5 node cluster handles 100k comfortably — and scales linearly as you add nodes.

Fix: add nodes. No architectural change needed. This is exactly why Cassandra was chosen.

Kafka

Single broker handles 100k-500k messages/sec. A cluster handles virtually unlimited.

Fix: add brokers and partitions. Configuration change, not architectural.

Fan-out Service — The Real Problem

This is where 100k writes/second actually breaks things.

Current (1,200 posts/sec × 1,000 avg followers):
= 1,200,000 Redis writes/second — aggressive but manageable

At 100k posts/sec × 1,000 avg followers:
= 100,000,000 Redis writes/second ❌

Fix 1 — Scale workers horizontally:

Increase Kafka topic partitions to 100
Deploy 100 fan-out worker instances
Each handles 1,000 posts/sec

Kafka consumer group distributes automatically:
Worker 1 → partitions 1-10
Worker 2 → partitions 11-20
...

Fix 2 — Batch Redis writes (pipeline):

Current: 1,000 individual Redis writes per post
         1,000 network round trips

Better: collect all 1,000 feed updates → single Redis pipeline
        1 network call instead of 1,000
        10-50x more efficient

Fix 3 — Redis Cluster (sharding):

Single Redis node: max ~1M ops/sec → dies at 100M ops/sec ❌

Redis Cluster with 100 nodes:
hash(userId) % 100 → routes to correct node
Each node handles 1M ops/sec
100 nodes × 1M = 100M ops/sec total ✅

Fix 4 — Lower celebrity threshold:

The hybrid model currently pushes posts for users with < 10k followers. At this scale, even 1k-follower users cause significant amplification.

Consider:
< 1,000 followers → push (fan-out on write)
1k-10k followers → push but async batched
> 10k followers → pull (fan-out on read)

More users on pull model = less fan-out pressure. Trade-off: slightly slower feed reads for “power users.”

Cassandra at Petabyte Scale

100k posts/sec × 86,400 × 365 × 10 years = 31.5 trillion posts
Each ~1KB → ~31.5 petabytes

Single Cassandra cluster won't hold this.

Fix — geographic sharding:

India cluster → Indian users' posts
US cluster → US users' posts
EU cluster → EU users' posts (also GDPR compliant)

Cross-region (rare):
User in India views US celebrity post
→ Fetched from US cluster
→ Cached in India CDN edge after first fetch

What Changed vs What Stayed the Same

Stayed the same:

Overall flow and component design
Same databases, same queues, same cache strategy

Changed:

More instances of everything
Redis single node → Redis Cluster (100 nodes)
Fan-out workers → 100 parallel instances
Redis writes → pipeline batched
More Kafka partitions
Cassandra → geo-sharded across regions
Celebrity threshold lowered

The Core Insight

Good architecture doesn’t break when scale increases — it just needs more of the same things. Bad architecture requires fundamental redesign at scale.

Choosing Cassandra over MySQL, Kafka over direct service calls, Redis sorted sets over custom structures — each decision was made knowing scale would come. When it does, you scale horizontally instead of rebuilding from scratch.