System Design Distributed Trade-offs: CAP and Consistency

Published: at 08:00 AM
(3 min read)

Table of contents

Open Table of contents

Introduction

When systems move from one server to many servers, failures become normal. CAP Theorem gives you the core decision model for those failures, and consistency models help you apply it feature-by-feature.

Lesson 3: CAP Theorem

The Core Idea

CAP stands for:

The practical theorem:

When a network partition happens, you must choose between Consistency and Availability.

Since partitions are unavoidable in distributed systems, the real choice is:

Banking Example

Two replicas: Mumbai and Delhi. Mumbai accepts a transfer, then network between regions fails.

Delhi still has old balance. Now you choose:

This is a business decision, not just a technical one.

Where Each Model Fits

CP (correct data or no data):

AP (always respond, slight staleness accepted):

Eventual Consistency

AP systems often use eventual consistency:

Data may be stale now, but replicas converge over time.

This is why many large products stay responsive during failures without requiring strict synchronization for every read.

Decision Framework

Ask one question per feature:

What is the cost of stale or wrong data?

Exercise: Food Delivery Features

Choose CP or AP for each:

  1. Restaurant menu and prices
  2. Order placement and payment
  3. Delivery person live location
  4. Likes on restaurant review

Reference answer:

Lesson 4: Availability vs Consistency (Deep Dive)

CAP gives the trade-off. This lesson makes it measurable and implementable.

Availability Is Measured in “Nines”

AvailabilityDowntime/yearDowntime/month
99%3.65 days7.3 hours
99.9%8.7 hours43 minutes
99.99%52 minutes4.3 minutes
99.999%5 minutes26 seconds

One extra nine looks small but dramatically changes architecture cost and complexity.

Consistency Is a Spectrum

Not all systems need strong consistency everywhere.

  1. Strong consistency

    • Every read gets latest write
    • Use for money, inventory, booking
  2. Eventual consistency

    • Replicas converge later
    • Use for feeds, counters, non-critical read paths
  3. Read-your-own-writes

    • A user sees their own update immediately
    • Great for comments, profile edits, content creation UX
  4. Causal consistency

    • Related events preserve logical order
    • Useful in collaboration/chat/threaded interactions

Consistency Zones in One Product

A single product usually uses multiple models:

Do not choose one global model for all features.

Exercise: Google Docs Style Editor

Scenario: multiple users edit same doc in near real time.

Questions:

  1. Latency-sensitive or throughput-sensitive?
  2. Main bottleneck at scale?
  3. CP or AP?
  4. Which consistency model for document updates?

Reference answer (practical):

Key Takeaways

  1. CAP is activated during network partition; then C vs A is a forced choice.
  2. Partition tolerance is non-negotiable in real distributed systems.
  3. Pick CP/AP per feature based on business cost of wrong data.
  4. Availability targets (“nines”) must be explicit before architecture decisions.
  5. Consistency is a spectrum; strong consistency everywhere is often unnecessary and expensive.

Part of the system design fundamentals series. Next: load balancers, caching layers, databases, queues, and CDN placement decisions.