If you give only vague scale answers in system design interviews, your design sounds generic. This reference gives practical ballpark numbers you can use to make concrete decisions.
Important: these are estimation ranges, not vendor guarantees. Real numbers depend on hardware, query shape, payload size, indexing, replication, and tuning.
Quick Reference Table
| Component | Ballpark Throughput | Latency (typical) | Notes |
|---|---|---|---|
| App server (4vCPU) | 1k-5k RPS (with DB calls), 10k-50k pure API | 1-10 ms | CPU vs IO bound changes this a lot |
| PostgreSQL | 10k-50k simple reads/s, 5k-10k writes/s | 1-10 ms | Complex joins/aggregations can drop to 100-1k QPS |
| Cassandra (cluster) | 60k-150k reads/s and writes/s (3-node) | 1-5 ms p99 | Partition-key lookups only; scales near linearly |
| Redis (single node) | 100k-1M ops/s | <1 ms | RAM-bound, not disk-bound |
| Kafka (cluster) | 1M+ msgs/s (depends on partitioning/msg size) | 5-15 ms e2e | Usually consumer side is bottleneck |
| RabbitMQ | 20k-50k msgs/s (less with durability) | 1-10 ms | Great for task queues at moderate scale |
| Elasticsearch | 1k-10k search QPS, 1k-10k index writes/s | 5-50 ms | Heavy aggregations slower (100-500 ms) |
| CDN | Millions RPS globally | 5-50 ms edge | Origin limits before CDN limits |
| S3/Object storage | 5.5k GET/s and 3.5k PUT/s per prefix | 100-200 ms | Use multiple prefixes + CDN fronting |
Per-Component Notes
App Server
- With database/network round-trips: usually 1k-5k RPS per instance.
- Pure in-memory lightweight APIs can go much higher (10k-50k RPS).
- For long-running workloads (uploads/processing), use async workers and queues.
PostgreSQL
- Strong transactional system, best for correctness-heavy workloads.
- Ballpark:
- Simple indexed reads: 10k-50k QPS
- Writes: 5k-10k writes/s
- ACID business transactions: 1k-5k TPS
- Practical hints:
- Keep active DB connections low (use PgBouncer).
- Add read replicas before sharding.
- Start thinking sharding around sustained high write pressure or very large datasets.
Cassandra
- Designed for high write throughput and horizontal scale.
- Ballpark:
- Single node: 20k-50k reads/s or writes/s
- 3-node cluster: 60k-150k reads/s and writes/s
- Works best when access pattern is known and based on partition key.
- Bad fit for ad hoc joins or arbitrary search.
Redis
- In-memory speed tier for caching, counters, locks, rate limits, dedupe.
- Ballpark:
- GET/SET: 100k-1M ops/s on a healthy node
- Complex commands lower than simple ops
- Practical limit is memory. Use Redis Cluster to shard when dataset grows.
Kafka
- Durable event log, excellent for decoupling and burst absorption.
- Ballpark:
- 100k-1M msgs/s per broker depending on message size and acks
- Multi-broker clusters can handle millions/sec
- Scale levers:
- More partitions -> more parallel consumers
- Keep messages small (often under 1 MB)
RabbitMQ
- Great for queue semantics and task distribution.
- Ballpark:
- 20k-50k msgs/s (lower with durable/persistent mode)
- Usually easier to run than Kafka at small-to-medium scale.
Elasticsearch
- Search engine, not a source-of-truth OLTP database.
- Ballpark:
- Query: 1k-10k QPS
- Indexing: 1k-10k docs/s
- Keep writes asynchronous from primary DB (event-driven indexing).
CDN and Object Storage
- CDN should serve user traffic; origin should serve CDN, not end users.
- S3 is cheap and durable for blobs, but not low-latency enough alone for hot delivery.
- Standard pattern: S3 + CDN for media/static.
Interview Decision Thresholds (Rule of Thumb)
Use these as rough triggers, not hard laws:
- Up to ~1k RPS: single Postgres + Redis cache is often enough.
- 1k-10k RPS: Postgres + replicas + caching + queue for async side effects.
- 10k-100k RPS sustained: sharding, stronger event pipeline, heavier Redis/CDN usage.
- 100k+ RPS and heavy writes: Cassandra/Kafka style architecture becomes common.
The key is not “use distributed tech early.” The key is “introduce complexity only when numbers justify it.”
How to Use These Numbers in an Interview
Step 1 - Estimate load
Example:
10M users
10 requests/user/day
= 100M requests/day
= ~1,200 RPS average
Peak factor 3x -> ~3,600 RPS peak
Step 2 - Map load to components
3,600 RPS app tier:
single instance maybe insufficient for resiliency
-> use 3-5 app instances behind LB
If each request does one DB read:
~3,600 QPS to DB
-> fits Postgres read capacity (with indexing)
Step 3 - Find bottlenecks
If writes grow to 15k/s sustained:
likely beyond comfortable single Postgres write profile
-> shard, queue, or move specific workloads to Cassandra
Step 4 - State migration path
Interviewers like this language:
“At current scale I’d keep Postgres for simplicity and consistency. If sustained writes cross 10k+/s, I would introduce write partitioning and event buffering, then move high-throughput append workloads to Cassandra while retaining Postgres for transactional correctness.”
Caveats You Should Always Say Out Loud
- These are ballparks on decent cloud hardware, not absolutes.
- Query shape matters more than database brand.
- p95/p99 latency and tail behavior matter more than average throughput.
- Capacity planning is iterative: benchmark, observe, tune, then scale.
If you say these caveats plus concrete ranges, your answer sounds practical and senior.