System Design Traffic and Cache: Load Balancers and Redis

Open Table of contents

Introduction
Lesson 5: Load Balancers
Lesson 6: Caching with Redis
Key Takeaways

Introduction

Once traffic grows, two problems appear quickly:

How do you distribute requests across many servers?
How do you stop the database from being hit for repeated reads?

Load balancers solve the first. Caching (often with Redis) solves the second.

Lesson 5: Load Balancers

What Problem They Solve

Without a load balancer, traffic can overload one server while others sit idle.

With a load balancer, requests are distributed across multiple servers so capacity scales horizontally.

Core Routing Algorithms

Round Robin
- Send requests in sequence: S1 -> S2 -> S3 -> S1…
- Good when servers are identical and requests are similar duration.
Weighted Round Robin
- Higher-capacity servers get more traffic.
- Good when server sizes differ.
Least Connections
- Send next request to server with fewest active connections.
- Best for variable-duration requests (for example, order processing).
IP Hash
- Same client IP tends to hit same backend.
- Useful for sticky sessions or localized cache behavior.

Placement in Architecture

Load balancing usually happens at multiple layers:

Global layer (geo routing / nearest region)
Regional layer (distribute traffic across app servers)
Internal layer (distribute traffic between internal services/replicas)

Avoiding Single Point of Failure

A load balancer can itself fail, so it also needs redundancy:

Active-Passive: one active, one standby
Active-Active: both serve traffic simultaneously

Layer 4 vs Layer 7

Layer 4 (transport): route by IP/port, very fast, less context
Layer 7 (application): route by URL/path/header/cookie, supports microservice path routing

Most modern microservice architectures use Layer 7 at the edge.

Exercise: Food Delivery Services

Scenario: User Service, Restaurant Service, and Order Service. Order Service is heavier and more variable.

Questions:

Layer 4 or Layer 7, and why?
Best algorithm for Order Service?
Where should load balancers be placed?

Reference answer:

Use Layer 7 for path-based routing across services.
Use Least Connections for Order Service because request duration varies.
Place LBs at entry and internal service layer; use active-active or active-passive for LB redundancy.

Lesson 6: Caching with Redis

What Problem Caching Solves

Many reads ask for the same data repeatedly. Without caching, every request hits the database.

Caching reduces:

response latency
database load
infrastructure cost

Cache Placement Options

Client cache (browser/app)
CDN cache (static/global assets)
Application cache (Redis between app and DB)
DB internal cache

Each serves a different layer of latency/load reduction.

Three Core Caching Strategies

Cache Aside (lazy)
- App reads cache first; on miss reads DB and populates cache.
- Most common default for read-heavy paths.
Write Through
- Write DB and cache together.
- Better freshness, slightly slower writes.
Write Behind
- Write cache first, persist to DB asynchronously.
- Very fast writes, but needs strong durability controls.

TTL Is a Design Decision

TTL (time-to-live) controls freshness vs load relief.

Long TTL -> lower DB load, higher stale risk
Short TTL -> fresher data, less cache benefit

Pick TTL by business cost of stale data.

What Not to Cache

secrets/credentials
highly sensitive financial values needing strict real-time correctness
huge objects with low reuse
values that change every request with no reuse benefit

Common Cache Failure Patterns

Cache stampede: many requests miss at once -> DB spike
Mitigate with locking/single-flight refresh.
Cache penetration: repeated requests for non-existent keys
Mitigate by caching null/negative results briefly.
Cache avalanche: many keys expire simultaneously
Mitigate with TTL jitter/randomized expiry.

Exercise: E-commerce Product Page

Data fields: product details, current price, stock availability, customer reviews.

Questions:

Should each be cached?
Which strategy should be used?
What TTL makes sense?

Reference answer (practical baseline):

Product details: cache yes, write-through or cache-aside, TTL ~24h
Current price: cache carefully, usually cache-aside, short TTL (~5-10 min for normal products; much shorter or bypass for flash pricing)
Stock availability: avoid normal caching for critical purchase path (or use ultra-short, tightly controlled strategy)
Customer reviews: cache yes, cache-aside, TTL ~5-10 min

Flash Sale Nuance

If price updates are frequent and read traffic is extreme, bypassing cache and hitting the DB directly will flood it.

The polling problem:

Normal polling:
User → "What's the price?" → Server → every second → 10M requests/sec
                                                      DB dies

The fix — push instead of poll:

WebSocket approach:
Server → "Price changed to ₹999" → all connected users simultaneously
→ 1 write event, 10 million users updated
→ Zero requests from users

Complete flash sale architecture:

Price changes every few seconds
        ↓
Write to Redis (primary store for price)
Write to Message Queue (for async DB sync)
        ↓
10M users connected via WebSocket
        ↓
Price change event pushed to all users simultaneously
        ↓
No polling. No DB flood.
Redis handles reads. DB updated from queue in background.

The mindset shift: for high-churn data with massive reads, Redis is not the cache — Redis is the source of truth. The DB becomes the async backup.

Key Takeaways

Load balancers enable horizontal scaling, but they also need redundancy.
Layer 7 routing is usually the right fit for microservice request routing.
Least Connections is strong for uneven request durations.
Caching is a business trade-off between freshness and performance.
TTL, invalidation, and failure-mode handling matter more than just adding Redis.

Part of the system design series. Next: SQL vs NoSQL, message queues, and CDN decision patterns.