Table of contents
Open Table of contents
Introduction
Once traffic grows, two problems appear quickly:
- How do you distribute requests across many servers?
- How do you stop the database from being hit for repeated reads?
Load balancers solve the first. Caching (often with Redis) solves the second.
Lesson 5: Load Balancers
What Problem They Solve
Without a load balancer, traffic can overload one server while others sit idle.
With a load balancer, requests are distributed across multiple servers so capacity scales horizontally.
Core Routing Algorithms
-
Round Robin
- Send requests in sequence: S1 -> S2 -> S3 -> S1…
- Good when servers are identical and requests are similar duration.
-
Weighted Round Robin
- Higher-capacity servers get more traffic.
- Good when server sizes differ.
-
Least Connections
- Send next request to server with fewest active connections.
- Best for variable-duration requests (for example, order processing).
-
IP Hash
- Same client IP tends to hit same backend.
- Useful for sticky sessions or localized cache behavior.
Placement in Architecture
Load balancing usually happens at multiple layers:
- Global layer (geo routing / nearest region)
- Regional layer (distribute traffic across app servers)
- Internal layer (distribute traffic between internal services/replicas)
Avoiding Single Point of Failure
A load balancer can itself fail, so it also needs redundancy:
- Active-Passive: one active, one standby
- Active-Active: both serve traffic simultaneously
Layer 4 vs Layer 7
- Layer 4 (transport): route by IP/port, very fast, less context
- Layer 7 (application): route by URL/path/header/cookie, supports microservice path routing
Most modern microservice architectures use Layer 7 at the edge.
Exercise: Food Delivery Services
Scenario: User Service, Restaurant Service, and Order Service. Order Service is heavier and more variable.
Questions:
- Layer 4 or Layer 7, and why?
- Best algorithm for Order Service?
- Where should load balancers be placed?
Reference answer:
- Use Layer 7 for path-based routing across services.
- Use Least Connections for Order Service because request duration varies.
- Place LBs at entry and internal service layer; use active-active or active-passive for LB redundancy.
Lesson 6: Caching with Redis
What Problem Caching Solves
Many reads ask for the same data repeatedly. Without caching, every request hits the database.
Caching reduces:
- response latency
- database load
- infrastructure cost
Cache Placement Options
- Client cache (browser/app)
- CDN cache (static/global assets)
- Application cache (Redis between app and DB)
- DB internal cache
Each serves a different layer of latency/load reduction.
Three Core Caching Strategies
-
Cache Aside (lazy)
- App reads cache first; on miss reads DB and populates cache.
- Most common default for read-heavy paths.
-
Write Through
- Write DB and cache together.
- Better freshness, slightly slower writes.
-
Write Behind
- Write cache first, persist to DB asynchronously.
- Very fast writes, but needs strong durability controls.
TTL Is a Design Decision
TTL (time-to-live) controls freshness vs load relief.
- Long TTL -> lower DB load, higher stale risk
- Short TTL -> fresher data, less cache benefit
Pick TTL by business cost of stale data.
What Not to Cache
- secrets/credentials
- highly sensitive financial values needing strict real-time correctness
- huge objects with low reuse
- values that change every request with no reuse benefit
Common Cache Failure Patterns
-
Cache stampede: many requests miss at once -> DB spike
Mitigate with locking/single-flight refresh. -
Cache penetration: repeated requests for non-existent keys
Mitigate by caching null/negative results briefly. -
Cache avalanche: many keys expire simultaneously
Mitigate with TTL jitter/randomized expiry.
Exercise: E-commerce Product Page
Data fields: product details, current price, stock availability, customer reviews.
Questions:
- Should each be cached?
- Which strategy should be used?
- What TTL makes sense?
Reference answer (practical baseline):
- Product details: cache yes, write-through or cache-aside, TTL ~24h
- Current price: cache carefully, usually cache-aside, short TTL (~5-10 min for normal products; much shorter or bypass for flash pricing)
- Stock availability: avoid normal caching for critical purchase path (or use ultra-short, tightly controlled strategy)
- Customer reviews: cache yes, cache-aside, TTL ~5-10 min
Flash Sale Nuance
If price updates are frequent and read traffic is extreme, bypassing cache and hitting the DB directly will flood it.
The polling problem:
Normal polling:
User → "What's the price?" → Server → every second → 10M requests/sec
DB dies
The fix — push instead of poll:
WebSocket approach:
Server → "Price changed to ₹999" → all connected users simultaneously
→ 1 write event, 10 million users updated
→ Zero requests from users
Complete flash sale architecture:
Price changes every few seconds
↓
Write to Redis (primary store for price)
Write to Message Queue (for async DB sync)
↓
10M users connected via WebSocket
↓
Price change event pushed to all users simultaneously
↓
No polling. No DB flood.
Redis handles reads. DB updated from queue in background.
The mindset shift: for high-churn data with massive reads, Redis is not the cache — Redis is the source of truth. The DB becomes the async backup.
Key Takeaways
- Load balancers enable horizontal scaling, but they also need redundancy.
- Layer 7 routing is usually the right fit for microservice request routing.
- Least Connections is strong for uneven request durations.
- Caching is a business trade-off between freshness and performance.
- TTL, invalidation, and failure-mode handling matter more than just adding Redis.
Part of the system design series. Next: SQL vs NoSQL, message queues, and CDN decision patterns.