This is the final lesson in the series — and the most infrastructure-heavy system we’ve designed. Video streaming introduces problems you haven’t seen yet: encoding pipelines, adaptive bitrate streaming, and content delivery at a scale that makes everything else look small.
Requirements
Functional
- User uploads a video
- Video is processed and made available for streaming
- Users can stream smoothly on any device
- Support multiple quality levels — 360p, 720p, 1080p, 4K
- Video resumes from where the user left off
- Search for videos
- Recommendations
Non-Functional
- High availability — videos must always be watchable
- Low latency start — video begins playing within 2 seconds
- Smooth playback — no buffering
- Scale — 500 hours of video uploaded every minute (YouTube scale)
- Global reach — users everywhere get the same quality experience
- Storage efficiency — petabytes of video stored cost-effectively
Scale Estimation
Videos uploaded per minute: 500 hours
= 500 × 60 = 30,000 minutes of video per minute
Storage per video:
1 minute of raw video ≈ 1 GB (uncompressed)
After encoding ≈ 100 MB for all quality levels combined
Per day:
500 hours/min × 60 min × 24 hours = 720,000 hours uploaded/day
720,000 × 100 MB = 72 petabytes/day
Video views (YouTube scale):
5 billion views/day = ~58,000 streams/second
What this tells you:
- Storage is the biggest cost challenge
- CDN is not optional — it is the core architecture
- Upload pipeline must handle massive parallel processing
- Read load is enormous — 58,000 concurrent streams
The Upload and Processing Pipeline
This is the most unique part of video streaming. Unlike other systems where you store and retrieve data directly — video must be transformed before it can be streamed.
Why Raw Video Cannot Be Streamed Directly
User uploads from phone:
→ Shot in 4K at 60fps
→ File size: 4 GB for 10 minutes
→ Format: MOV / MP4 / AVI / MKV
→ Codec: H.265 or various others
Problems:
→ 4 GB file → mobile user waits forever to buffer
→ MOV format → not supported on all browsers
→ One quality level → terrible on slow connections
→ No chapters, thumbnails, or preview sprites
Every uploaded video must go through a processing pipeline before it’s watchable.
The Processing Pipeline
Step 1 — Upload to Object Storage
Client → pre-signed S3 URL → uploads directly to S3
App server never touches the video bytes
S3 handles petabytes of raw uploads
Step 2 — Trigger Processing
S3 upload complete → S3 event → Kafka "video.uploaded"
→ Video Processing Service picks up event
Step 3 — Validation
- Check file is a valid video
- Check duration and size limits
- Malware scan
- Content moderation check (NSFW detection)
Step 4 — Transcoding ← most important step
Convert one raw video into multiple formats and qualities:
→ 360p — slow mobile connections
→ 480p — average mobile
→ 720p — standard HD
→ 1080p — full HD
→ 4K — premium users on fast connections
→ Each quality in multiple formats: MP4, WebM
Step 5 — Thumbnail Generation
→ Extract frames at regular intervals
→ Generate thumbnail images
→ Generate preview sprite (the tiny previews when hovering the timeline)
Step 6 — Store and Distribute
→ All transcoded files → S3
→ Thumbnails → S3
→ Update video metadata in database → status: "available"
→ CDN pulls processed files from S3
→ Distributes to edge nodes globally
Transcoding at Scale
500 hours uploaded per minute. Each video needs transcoding into 5 quality levels. That’s massive parallel computation.
Transcoding is CPU intensive:
1 minute of video → 5 quality levels → 5-10 minutes of CPU time
500 hours/min uploaded:
= 30,000 minutes of video/min
= 30,000 × 5 qualities
= 150,000 transcoding jobs per minute
Solution — parallel chunk transcoding:
Video uploaded
↓
Kafka "video.uploaded"
↓
Job Scheduler splits video into chunks:
→ Video split into 10-second segments
→ Each segment transcoded independently in parallel
→ Segments reassembled after transcoding
1 hour video = 360 ten-second segments
360 workers transcode simultaneously
→ 1 hour video ready in ~2 minutes instead of hours
This is the same MapReduce principle — split, process in parallel, reassemble.
Adaptive Bitrate Streaming (ABR)
This is the technology that makes Netflix and YouTube feel smooth even on variable connections.
The Problem
User on WiFi → 1080p playing perfectly
User switches to mobile data → connection slows
1080p requires 8 Mbps → user only has 2 Mbps
→ Video buffers → terrible experience ❌
The Solution — HLS (HTTP Live Streaming)
Instead of one video file — serve a playlist of small chunks:
master.m3u8 (master playlist):
→ Links to quality-specific playlists
1080p.m3u8:
→ segment001_1080p.ts (10 seconds)
→ segment002_1080p.ts (10 seconds)
→ segment003_1080p.ts (10 seconds)
...
720p.m3u8:
→ segment001_720p.ts (10 seconds)
→ segment002_720p.ts (10 seconds)
...
360p.m3u8:
→ segment001_360p.ts (10 seconds)
...
How the player uses this:
Every 10 seconds the player:
→ Measures current download speed
→ Decides which quality to request next
Download speed > 8 Mbps → request 1080p next segment
Download speed 4–8 Mbps → request 720p next segment
Download speed < 2 Mbps → request 360p next segment
User never notices the switch
Player switches seamlessly between qualities
Buffer never empties → smooth playback ✅
This is why YouTube quality changes smoothly — it’s not one file, it’s thousands of small chunks served adaptively.
Content Delivery Architecture
This is where CDN becomes the entire architecture — not just an add-on.
Without CDN
58,000 concurrent streams
Each stream at 720p = 4 Mbps
Total bandwidth: 58,000 × 4 Mbps = 232 Gbps
All from your origin servers
→ Impossible to serve from one location
→ Terrible latency for users far away ❌
With CDN
58,000 streams distributed across hundreds of CDN edge nodes
→ User in Chennai served from Chennai edge node
→ User in London served from London edge node
→ Origin servers serve CDN nodes, not individual users
→ Origin bandwidth: fraction of total
→ Latency: minimal everywhere ✅
CDN Caching Strategy for Video
Popular videos (top 10% get 90% of views):
→ Cached at every CDN edge node globally
→ TTL: weeks or months
→ CDN hit rate: ~95%
Long tail videos (rarely watched):
→ Cached only at regional CDN nodes
→ Fetched from origin on first regional request
→ TTL: days
Very old / rarely watched:
→ Not cached at CDN
→ Served directly from S3 on demand
→ Cost optimised — no point caching what nobody watches
Data Model
Video Metadata — Cassandra
videos:
video_id UUID
uploader_id UUID
title text
description text
status enum (uploading/processing/available/removed)
duration_seconds int
view_count counter
like_count counter
created_at timestamp
tags list<text>
category text
storage_paths:
raw s3://raw/videoId/original.mp4
360p s3://processed/videoId/360p.m3u8
720p s3://processed/videoId/720p.m3u8
1080p s3://processed/videoId/1080p.m3u8
thumbnail s3://thumbs/videoId/thumb.jpg
Why Cassandra: massive scale, simple video_id lookups, write-heavy (view counts updating
constantly).
Watch History and Resume Position — Cassandra
watch_history:
user_id UUID (partition key)
video_id UUID
watched_at timestamp
watch_duration int
last_position int (seconds — for resume)
completed boolean
Why Cassandra: billions of watch events per day, time series append-only writes, access pattern is always “give me history for user X.”
Search Index — Elasticsearch
Video search is a separate problem entirely. Cassandra cannot do full-text search or relevance ranking. Elasticsearch handles:
- Full-text search on title, description, tags
- Faceted filters — category, duration, upload date
- Relevance ranking
- Returns video IDs → fetch details from Cassandra/Redis
Recommendations
Collaborative filtering:
"Users who watched video A also watched video B"
→ Recommend B to anyone who just watched A
Content-based filtering:
Tags, category, uploader
→ Recommend similar content
Implementation:
Watch events → Kafka → ML pipeline processes patterns
→ Precomputed recommendations stored in Redis
Key: "next_videos:{videoId}" → list of video IDs
TTL: 1 hour → refreshed regularly
When user finishes video:
→ Fetch recommendations from Redis instantly
→ No real-time ML computation in the request path
Pre-computation keeps latency low. You never run ML models live during a user request.
Resume Feature
User watches 40% of a video, closes app, returns next day:
Every 10 seconds while watching:
→ Client sends heartbeat: { userId, videoId, position: 245 }
→ App server writes to Redis:
Key: "watch:{userId}:{videoId}"
Value: 245 (seconds)
TTL: 90 days
Why not write to database directly:
→ 10M active users × heartbeat every 10s = 1M writes/second
→ Database can't handle this
→ Redis handles it easily ✅
Async sync to Cassandra:
→ Background job syncs Redis positions to Cassandra every 5 minutes
→ Permanent durable record
→ Redis is the fast layer, Cassandra is the durable layer
Complete Architecture
Upload Path
User selects video
↓
App Server issues pre-signed S3 URL
↓
Client uploads directly to S3 (bypasses app servers)
↓
S3 triggers event → Kafka "video.uploaded"
↓
Video Processing Service:
→ Validates video
→ Splits into 10-second chunks
→ Distributes to Transcoding Workers (100s of them, parallel)
→ Reassembles transcoded segments
→ Generates thumbnails and preview sprites
→ Stores all files to S3
→ Updates status in Cassandra → "available"
→ Notifies uploader via notification system
Stream Path
User clicks play
↓
App Server:
→ Fetch video metadata from Redis cache
→ Cache miss → Cassandra → store in Redis
→ Return master.m3u8 URL (CDN URL)
↓
Video player fetches master.m3u8 from CDN
↓
Player measures bandwidth → selects quality
↓
Player fetches 10-second segments from CDN:
→ CDN hit → served instantly from edge ✅
→ CDN miss → CDN fetches from S3 → caches → serves
↓
Every 10s → player fetches next segment
Every 10s → client sends position heartbeat to Redis
↓
Adaptive quality switching happens transparently
Supporting Systems
View counts:
→ Stream start → Kafka "video.viewed"
→ Redis counter incremented
→ Async sync to Cassandra every 60 seconds
Search:
→ Video metadata indexed in Elasticsearch on publish
→ Search queries hit Elasticsearch
→ Returns IDs → details fetched from Redis/Cassandra
Recommendations:
→ Watch events → Kafka → ML pipeline
→ Precomputed results → Redis
→ Served instantly on video end
Connecting Every Lesson
This system uses every concept from the series:
| Lesson | Where it appears in video streaming |
|---|---|
| Latency vs Throughput (1) | Stream start is latency-sensitive (<2s); transcoding is throughput-sensitive |
| Scalability (2) | Transcoding workers scale horizontally; CDN scales delivery globally |
| CAP Theorem (3) | Streaming → AP (staleness fine); payment → CP (consistency mandatory) |
| Consistency (4) | Watch position → eventual OK; uploader sees their video → read-your-own-writes |
| Load Balancers (5) | Distribute upload and stream requests; balance transcoding worker pool |
| Caching (6) | Redis for metadata, positions, counters, recommendations; CDN at edge |
| Databases (7) | Cassandra for metadata/history; Elasticsearch for search; S3 for files |
| Message Queues (8) | Kafka decouples upload from processing, carries view events, feeds ML |
| CDN (9) | CDN is the streaming architecture — without it this system cannot exist |
The Key Insight
The most important principle this system teaches:
The solution is never “find a faster server.” The solution is to never let the bottleneck see the load directly.
- Pre-signed URLs → app servers never touch video bytes
- Kafka → database never sees raw upload spikes
- CDN → origin servers never see 58,000 concurrent streams
- Redis → database never sees 1M position updates/second
- Parallel chunk transcoding → no single worker processes a full video
Every layer shields the one below it from the full force of the traffic.
What to Practice Next
You’ve covered the theory and design of 14 systems. The next step is practice:
Week 1–2 — Solo practice: Pick any app you use daily. Design it from scratch using the 7-step framework. Time yourself — 45 minutes per system.
Systems to tackle next:
- WhatsApp / messaging system
- Google Drive / file storage
- Uber Eats / food delivery
- Zoom / video conferencing
- Twitter search / typeahead
Week 3–4 — Mock interviews: Practice explaining designs out loud. The thinking is right — now build the communication. Record yourself and review.
Go deeper on:
- Distributed transactions
- Consistent hashing in depth
- Leader election algorithms
- Two-phase commit
Every system design problem you’ll ever face reduces to the same questions you’ve been answering since Lesson 1:
What are we optimizing for? Where will it break? What’s the right trade-off? What’s the simplest solution that actually works?
You came in knowing the tools. You leave knowing how to think. That’s the difference that matters in interviews and in real engineering.