System Design: Feed System (Twitter/Instagram)

Published: at 11:15 AM
(8 min read)

Table of contents

Open Table of contents

Introduction

A feed system is what powers Twitter, Instagram, and Facebook. It’s also where the hardest scaling problem in social media lives — showing 500 million users a personalised feed in under 100ms. The key challenge isn’t storage or bandwidth. It’s the fundamental tension between write speed and read speed.


Step 1: Requirements

Functional:

Non-functional:


Step 2: Scale Estimation

500M daily active users
Posts created/day: 100M = ~1,200 writes/second

Feed reads:
500M users × 10 feed opens/day = 5B reads/day
= ~58,000 reads/second

Read:Write ratio = ~50:1 — extremely read-heavy

This immediately tells you:


Step 3: The Core Problem — Two Approaches

Approach 1: Pull Model (Fan-out on Read)

Compute feed on the fly when user opens it.

User opens feed
→ Fetch all 200 followees
→ Query DB for recent posts from each
→ Merge and sort 4,000 posts
→ Return top 20

Problem: at 58,000 feed opens/second this means millions of DB queries per second. Database collapses immediately.

Works when: user follows very few people, low traffic, content changes so fast caching is useless.

Approach 2: Push Model (Fan-out on Write)

Pre-compute and deliver to follower feeds when a post is created.

User u123 posts
→ Fetch all 1,000 followers
→ Insert post into each follower's feed cache
→ Done — feeds already prepared

User opens feed
→ Just read pre-computed feed from cache → milliseconds

Problem — the celebrity problem:

Virat Kohli posts
→ 250M followers
→ 250M write operations
→ Takes hours to complete fan-out
→ Followers see post hours late

Works when: users have moderate follower counts, read speed is critical, write delay is acceptable.

The Real Solution: Hybrid Model

This is what Twitter, Instagram, and Facebook actually use.

Classify users:

Feed read flow:

User opens feed
→ Read pre-computed feed from Redis (normal users' posts) → fast
→ Fetch recent posts from celebrities they follow → separate query
→ Merge both lists → sort by timestamp
→ Return unified feed under 100ms ✅

Step 4: Data Model

Posts — Cassandra:

post_id       → time-based unique ID (sortable)
user_id       → author
content       → text
media_urls    → array
created_at    → timestamp
like_count    → counter (via Redis, synced async)
comment_count → counter

Cassandra fits: 1,200 writes/second, time series, simple access pattern (posts by user_id).

Feed cache — Redis Sorted Set:

Key: "feed:userId"
Value: sorted set of post_ids, scored by timestamp
Max: 1,000 post IDs per user

redis.zadd("feed:u456", timestamp, postId)
redis.zrange("feed:u456", 0, 19)  → top 20 post IDs

Redis sorted sets are perfect: O(log n) insert, O(log n) range query, always sorted by score.

Follow relationships — PostgreSQL:

follower_id   → user who follows
followee_id   → user being followed
created_at    → timestamp

Needs indexes in both directions:


Step 5: Full Architecture

Write Path

User creates post

App Server

1. Save post to Cassandra (persistent storage)

2. Publish to Kafka: "post.created"
   { postId, userId, timestamp }

Fan-out Service consumes event:
→ Fetch followers of userId
→ Celebrity? (> 10K followers)
     Yes → skip fan-out, mark as celebrity post
     No  → for each follower:
           zadd "feed:followerId" timestamp postId
           trim feed to 1,000 posts

Read Path

User requests feed

1. Fetch top 200 post IDs from Redis "feed:userId"

2. Fetch celebrity posts separately
   (get celebrities user follows → recent posts from Cassandra)

3. Merge + sort by timestamp → take top 20

4. Batch fetch post details
   → Redis post cache (hit)
   → Cassandra (miss only) → store in Redis

5. Return feed

Step 6: Three Levels of Caching

Level 1 — Feed cache (Redis sorted set):

"feed:userId" → sorted post IDs
No expiry — updated on every new post
Trimmed to 1,000 posts max

Level 2 — Post cache (Redis hash):

"post:postId" → full post data
Cache Aside strategy
TTL: 24 hours

Level 3 — User profile cache (Redis):

"user:userId" → username, avatar, verified
TTL: 1 hour

Without Level 3: fetching 20 posts = 20 separate profile DB lookups. With cache: all from Redis in one pass.


Step 7: Edge Cases

New user — empty feed:

No follows yet → show trending posts, suggested users, popular content by interest category. Computed separately, not from follow graph.

User returns after long absence:

Feed cache expired → fall back to pull model just once, reconstruct feed, populate Redis. Future reads served from cache.

Post deletion:

Can’t immediately purge from millions of Redis caches. Use soft delete — mark deleted in Cassandra, filter on read, Redis cache expires naturally. Eventual consistency is acceptable.

Like/comment counts at scale:

1M likes in an hour = 1M DB writes. Instead:

→ Increment count in Redis instantly (write behind)
→ Async job syncs Redis count to Cassandra every 60 seconds
→ Reads from Redis always — fast
→ DB eventually consistent — acceptable

Feature Extension: Ranked Feed

The ask: show posts the user is most likely to engage with first (Instagram-style algorithm).

Wrong approach: rank during fan-out.
Fan-out happens when post is brand new — zero likes, zero comments, no engagement signal yet. Ranking at write time is inaccurate.

Right approach: rank at read time.

User opens feed
→ Fetch 200 post IDs from Redis (chronological)
→ Pass to Ranking Service
→ Ranking Service scores each post:
  Score = f(
    relationship_score,  → how often you interact with this author
    post_engagement,     → current likes/comments
    recency,             → how recent
    content_affinity,    → your past engagement patterns
  )
→ Return top 20 ranked posts

New components:

The trade-off: ranking adds latency to feed reads. Before ranking: under 10ms. After ranking: up to 100ms. You trade speed for relevance.


Feature Extension: Stories

The ask: posts that disappear after 24 hours, shown at top of feed separately.

No new database needed. Stories are posts with two differences: they expire and they display differently. Extend the existing posts table:

type          → "POST" | "STORY"     (new)
expires_at    → null for posts, created_at + 24h for stories (new)

Expiry — use Cassandra’s native TTL:

INSERT INTO posts (...) USING TTL 86400

Cassandra automatically deletes after 86400 seconds. No worker, no extra logic.

Story feed — separate read query:

SELECT * FROM posts
WHERE type = "STORY"
AND user_id IN (users I follow)
AND expires_at > now()
ORDER BY created_at DESC

Story cache:

Redis key: "stories:userId"
TTL: 5 minutes
5-minute staleness is acceptable for stories

Story views:

story_views (Cassandra):
  story_id  → which story
  viewer_id → who viewed
  viewed_at → timestamp
  TTL: 25 hours (auto-cleans with story)

Complete Architecture

[User Creates Post]

[Load Balancer] → [App Servers]
      ↓                ↓
[Cassandra]       [Kafka "post.created"]
(post stored)          ↓
               [Fan-out Service]
               Normal → push to Redis feeds
               Celebrity → skip, pull on read

             [Redis Cluster]
             feed:userId → sorted post IDs
             post:postId → full post data
             user:userId → profile cache

[User Reads Feed]

[Load Balancer] → [App Servers]

[Redis] → pre-computed feed IDs

[Ranking Service] → scores posts

[Redis] → post details (cache hit)
      ↓ (miss only)
[Cassandra] → post details

[Celebrity posts fetched separately, merged]

[Return unified ranked feed under 100ms]

[Images/Videos]
→ Stored in S3
→ Served via CDN globally
→ Never hits app servers

Everything Connects

ConceptHow It’s Used
Latency vs Throughput (L1)Feed reads = latency sensitive; fan-out = throughput sensitive
Scalability (L2)Fan-out workers scale horizontally
CAP Theorem (L3)Feed = AP (eventual); post storage = CP
Consistency (L4)Read-your-own-writes for your own posts
Load Balancers (L5)Distribute across app servers and fan-out workers
Caching (L6)Three-level cache; write-behind for like counts
Databases (L7)Cassandra (posts, time series), PostgreSQL (follows), Redis (feeds)
Message Queues (L8)Kafka decouples post creation from fan-out
CDN (L9)Images and videos served from edge

Key Takeaways

  1. Pull model collapses at scale. Push model collapses for celebrities. Hybrid model solves both.
  2. Redis sorted sets are purpose-built for feed storage — timestamp as score, always sorted.
  3. The celebrity threshold (10K followers) is a product + engineering decision, not a fixed number.
  4. Ranking must happen at read time, not write time — engagement signals don’t exist yet at fan-out.
  5. Stories need no new database — extend the post model with type and TTL.
  6. Cassandra native TTL is cleaner than a worker for time-based expiry.

Part of the system design series. Next: designing a ride-sharing system — location-based matching, real-time tracking, and dynamic pricing.