System Design Capacity Cheat Sheet (Interview Ballparks)

If you give only vague scale answers in system design interviews, your design sounds generic. This reference gives practical ballpark numbers you can use to make concrete decisions.

Important: these are estimation ranges, not vendor guarantees. Real numbers depend on hardware, query shape, payload size, indexing, replication, and tuning.

Quick Reference Table

Component	Ballpark Throughput	Latency (typical)	Notes
App server (4vCPU)	1k-5k RPS (with DB calls), 10k-50k pure API	1-10 ms	CPU vs IO bound changes this a lot
PostgreSQL	10k-50k simple reads/s, 5k-10k writes/s	1-10 ms	Complex joins/aggregations can drop to 100-1k QPS
Cassandra (cluster)	60k-150k reads/s and writes/s (3-node)	1-5 ms p99	Partition-key lookups only; scales near linearly
Redis (single node)	100k-1M ops/s	<1 ms	RAM-bound, not disk-bound
Kafka (cluster)	1M+ msgs/s (depends on partitioning/msg size)	5-15 ms e2e	Usually consumer side is bottleneck
RabbitMQ	20k-50k msgs/s (less with durability)	1-10 ms	Great for task queues at moderate scale
Elasticsearch	1k-10k search QPS, 1k-10k index writes/s	5-50 ms	Heavy aggregations slower (100-500 ms)
CDN	Millions RPS globally	5-50 ms edge	Origin limits before CDN limits
S3/Object storage	5.5k GET/s and 3.5k PUT/s per prefix	100-200 ms	Use multiple prefixes + CDN fronting

Per-Component Notes

App Server

With database/network round-trips: usually 1k-5k RPS per instance.
Pure in-memory lightweight APIs can go much higher (10k-50k RPS).
For long-running workloads (uploads/processing), use async workers and queues.

PostgreSQL

Strong transactional system, best for correctness-heavy workloads.
Ballpark:
- Simple indexed reads: 10k-50k QPS
- Writes: 5k-10k writes/s
- ACID business transactions: 1k-5k TPS
Practical hints:
- Keep active DB connections low (use PgBouncer).
- Add read replicas before sharding.
- Start thinking sharding around sustained high write pressure or very large datasets.

Cassandra

Designed for high write throughput and horizontal scale.
Ballpark:
- Single node: 20k-50k reads/s or writes/s
- 3-node cluster: 60k-150k reads/s and writes/s
Works best when access pattern is known and based on partition key.
Bad fit for ad hoc joins or arbitrary search.

Redis

In-memory speed tier for caching, counters, locks, rate limits, dedupe.
Ballpark:
- GET/SET: 100k-1M ops/s on a healthy node
- Complex commands lower than simple ops
Practical limit is memory. Use Redis Cluster to shard when dataset grows.

Kafka

Durable event log, excellent for decoupling and burst absorption.
Ballpark:
- 100k-1M msgs/s per broker depending on message size and acks
- Multi-broker clusters can handle millions/sec
Scale levers:
- More partitions -> more parallel consumers
- Keep messages small (often under 1 MB)

RabbitMQ

Great for queue semantics and task distribution.
Ballpark:
- 20k-50k msgs/s (lower with durable/persistent mode)
Usually easier to run than Kafka at small-to-medium scale.

Elasticsearch

Search engine, not a source-of-truth OLTP database.
Ballpark:
- Query: 1k-10k QPS
- Indexing: 1k-10k docs/s
Keep writes asynchronous from primary DB (event-driven indexing).

CDN and Object Storage

CDN should serve user traffic; origin should serve CDN, not end users.
S3 is cheap and durable for blobs, but not low-latency enough alone for hot delivery.
Standard pattern: S3 + CDN for media/static.

Interview Decision Thresholds (Rule of Thumb)

Use these as rough triggers, not hard laws:

Up to ~1k RPS: single Postgres + Redis cache is often enough.
1k-10k RPS: Postgres + replicas + caching + queue for async side effects.
10k-100k RPS sustained: sharding, stronger event pipeline, heavier Redis/CDN usage.
100k+ RPS and heavy writes: Cassandra/Kafka style architecture becomes common.

The key is not “use distributed tech early.” The key is “introduce complexity only when numbers justify it.”

How to Use These Numbers in an Interview

Step 1 - Estimate load

Example:

10M users
10 requests/user/day
= 100M requests/day
= ~1,200 RPS average
Peak factor 3x -> ~3,600 RPS peak

Step 2 - Map load to components

3,600 RPS app tier:
single instance maybe insufficient for resiliency
-> use 3-5 app instances behind LB

If each request does one DB read:
~3,600 QPS to DB
-> fits Postgres read capacity (with indexing)

Step 3 - Find bottlenecks

If writes grow to 15k/s sustained:
likely beyond comfortable single Postgres write profile
-> shard, queue, or move specific workloads to Cassandra

Step 4 - State migration path

Interviewers like this language:

“At current scale I’d keep Postgres for simplicity and consistency. If sustained writes cross 10k+/s, I would introduce write partitioning and event buffering, then move high-throughput append workloads to Cassandra while retaining Postgres for transactional correctness.”

Caveats You Should Always Say Out Loud

These are ballparks on decent cloud hardware, not absolutes.
Query shape matters more than database brand.
p95/p99 latency and tail behavior matter more than average throughput.
Capacity planning is iterative: benchmark, observe, tune, then scale.

If you say these caveats plus concrete ranges, your answer sounds practical and senior.