Key Topics Overview
Core concepts you need to know for system design interviews.
CAP Theorem
The CAP theorem states that a distributed system can only provide two of the following three guarantees:
Consistency
- Every read receives the most recent write
- All nodes see the same data at the same time
Availability
- Every request receives a response
- System remains operational even with node failures
Partition Tolerance
- System continues to operate despite network failures
- Must handle network partitions between nodes
Trade-offs
- CA: RDBMS (MySQL, PostgreSQL)
- CP: MongoDB, Redis
- AP: Cassandra, DynamoDB
Load Balancing
Load balancers distribute incoming traffic across multiple servers to ensure:
Key Features
- High Availability
- Fault Tolerance
- Scalability
Common Algorithms
- Round Robin
- Least Connections
- Weighted Round Robin
- IP Hash
- Least Response Time
Caching
Caching improves system performance by storing frequently accessed data in faster memory.
Caching Strategies
-
Cache-Aside (Lazy Loading)
- Load data into cache only when needed
- Good for read-heavy workloads
-
Write-Through
- Update cache and DB simultaneously
- Ensures consistency
-
Write-Behind
- Update cache first, then DB asynchronously
- Better write performance
Cache Eviction Policies
- LRU (Least Recently Used)
- LFU (Least Frequently Used)
- FIFO (First In First Out)
Content Delivery Networks (CDN)
CDNs distribute content to geographically dispersed servers to:
Benefits
- Reduce Latency
- Decrease Server Load
- Improve Availability
- Handle Traffic Spikes
Use Cases
- Static Content
- Media Files
- API Caching
- Dynamic Content
Database Architecture
Master-Slave Replication
Master (Primary)
- Handles write operations
- Maintains authoritative copy
- Replicates changes to slaves
Slaves (Replicas)
- Handle read operations
- Provide redundancy
- Scale read capacity
When to Use
- Read-heavy workloads
- Need for data redundancy
- Geographic distribution
Scaling Strategies
Vertical Scaling (Scale Up)
- Add more power to existing machines
- Limits: Hardware capacity
- Simple but expensive
Horizontal Scaling (Scale Out)
- Add more machines
- Better fault tolerance
- More complex architecture
Database Sharding
Horizontal Sharding
- Split data across multiple databases
- Based on partition key
- Example: User IDs 1-1M on Shard 1, 1M-2M on Shard 2
Vertical Sharding
- Split different features into separate databases
- Example: User profiles in one DB, user posts in another
Database Types
SQL (Relational)
- Structured data
- ACID compliance
- Complex queries
- Examples: MySQL, PostgreSQL
NoSQL
-
Document (MongoDB)
- Flexible schema
- Nested data
- Good for content management
-
Key-Value (Redis)
- Simple structure
- High performance
- Caching
-
Column-Family (Cassandra)
- High scalability
- Good for time-series data
-
Graph (Neo4j)
- Relationship-focused
- Social networks
- Recommendation engines
API Design
REST Principles
- Stateless
- Resource-based
- Standard HTTP methods
- HATEOAS
Best Practices
- Use proper HTTP methods
- Version your APIs
- Use proper status codes
- Implement pagination
- Support filtering and sorting
Synchronous vs Asynchronous
Synchronous
- Blocking operations
- Immediate response
- Simpler to implement
- Higher latency
Asynchronous
- Non-blocking
- Better scalability
- Message queues
- Event-driven architecture
When to Use Each
- Sync: CRUD operations, simple requests
- Async: Long-running tasks, notifications
Idempotency
Definition
- Multiple identical requests should have same effect as single request
- Critical for distributed systems
Implementation
- Use idempotency keys
- Store request status
- Check for duplicates
Idempotent HTTP Methods
- GET
- PUT
- DELETE
- HEAD
Non-Idempotent Methods
- POST
- PATCH