Core system design theoretical principles include:
- Consistency
- Availability
- Partition
tolerance
- CAP
theorem
- PACELC
theorem
These are usually grouped under Distributed System Principles.
1.
Availability
–
Every
request gets a response — success or failure — without guarantee of the latest
data.
Availability refers to the ability of a system to provide its services to clients even in the presence of failures.
Formula
for Availability
Availability=Uptime/Uptime+Downtime
Where:
- Uptime → the duration when the
system is working and accessible
- Downtime → the duration when the
system is not reachable (due to failure, upgrades, network issues, etc.)
Availability
is often expressed using the number of 9s, showing how reliable a system
is.
If
a service has:
- 99.00% availability → it has 2 nines
- 99.9% availability → it has 3 nines
- 99.99% availability → it has 4 nines
More nines = higher reliability and much less downtime.
2.
Consistency
Consistency refers to the system's ability to ensure that all users see the same data, regardless of where or when they access it. In a consistent system, any update to the data is immediately visible to all users, and there are no conflicting or outdated versions of the data.
In
distributed systems:
All nodes see the same data at the same time.
Types
of Consistency Models
- Strong
Consistency -
After an update is made to the data, it will be immediately visible to any
subsequent read operations. In simple way —reads always reflect the latest
write.
Ex: An example of strong consistency
is a financial system where users can transfer money between accounts. The
system is designed for high data integrity, so the data is stored in a single
location and updates to that data are immediately propagated to all other
locations. This ensures that all users and applications are working with the
same, accurate data. For instance, when a user initiates a transfer of funds
from one account to another, the system immediately updates the balance of both
accounts and all other system components are immediately aware of the change.
This ensures that all users can see the updated balance of both accounts and
prevents any discrepancies.
- Weak
Consistency —
After an update is made to the data, it is not guaranteed that any
subsequent read operation will immediately reflect the changes made. The
read may or may not see the recent write. – no guarantee.
Ex:
Another example of weak consistency is a gaming platform where users can
play online multiplayer games. When a user plays a game, their actions are
immediately visible to other players in the same data center, but if there was
a lag or temporary connection loss, the actions may not be seen by some of the
users and the game will continue. This can lead to inconsistencies between
different versions of the game state, but it also allows for a high level of
availability and low latency.
- Eventual
Consistency —
Eventual consistency is a form of Weak Consistency. After an update is
made to the data, it will be eventually visible to any subsequent read
operations. The data is replicated in an asynchronous manner, ensuring
that all copies of the data are eventually updated.
Ex: An example of eventual
consistency is a social media platform where users can post updates, comments,
and messages. The platform is designed for high availability and low latency,
so the data is stored in multiple data centers around the world. When a user
posts an update, the update is immediately visible to other users in the same
data center, but it may take some time for the update to propagate to other
data centers. This means that some users may see the update while others may
not, depending on which data center they are connected to. This can lead to
inconsistencies between different versions of the data, but it also allows for
a high level of availability and low latency.
Strong
consistency usually increases latency and reduces availability
3.
CAP
Theorem
CAP theorem states that in a distributed system, during a partition (network
failure), you can only guarantee two
of the following three:
1.
Consistency (C)
2.
Availability (A)
3.
Partition
Tolerance (P)
Because partitions are unavoidable, systems
must choose between:
·
CP
→ Consistent & Partition-Tolerant (sacrifice availability)
·
AP
→ Available & Partition-Tolerant (sacrifice consistency)
CP
Systems (prioritize consistency)
- Reads/writes
may fail during a partition
- Always
return up-to-date data
Examples:
- Zookeeper
- MongoDB
with majority writes
- HBase
- Spanner
(via Paxos/TrueTime)
AP
Systems (prioritize availability)
- Always
return a response
- Data
may be stale or eventually consistent
Examples:
- Cassandra
- DynamoDB
- Riak
- CouchDB
Key
Insight
You cannot avoid partition tolerance in real systems, so CAP is really about choosing C or A during a partition.
4. PACELC Theorem — Definition (Advanced Version
of CAP)
CAP only
considers trade-offs during a partition.
PACELC
expands this:
If there
is a Partition (P), choose Availability or Consistency.
Else (E = Else), even without partition, choose between Latency or Consistency.
Interpretation
- PA/EL → During partition choose Availability,
otherwise choose Low Latency
- PC/EC → During partition choose Consistency,
otherwise choose Consistency (still prioritize C)
PACELC
theorem was developed to address a key limitation of the CAP theorem as it
makes no provision for performance or latency.
For example, according to the CAP theorem, a database can be considered available if a query returns a response after 30 days. Obviously, such latency would be unacceptable for any real-world application.
Why
PACELC Was Introduced (Simple Explanation)
Problem
with CAP
CAP
theorem says:
During a
partition, you must choose Consistency (C) or Availability (A).
But CAP does
not answer an important question:
🟡 What trade-off does the system make when the
network is healthy — i.e., when there is NO partition?
Modern
distributed databases need additional trade-offs even when everything is
working normally.
Example
questions CAP doesn't answer:
- Should the system prioritize low
latency?
- Or should it prioritize strong
consistency for every read/write?
This gap led to PACELC.
What
PACELC Adds
PACELC
expands CAP by adding the Else (E) part:
If a Partition happens (P), choose
Availability (A) or Consistency (C).
Else (E) when no partition, choose
between Latency (L) or Consistency (C).
This
clarifies two trade-offs:
1️. During partition:
- Pick A or C —
same as CAP.
2️. During normal operation:
- Pick L (low latency) or C (strong consistency).
Why is
this needed? (Practical view)
Modern
distributed databases (Cassandra, DynamoDB, Spanner, CockroachDB, MongoDB,
etc.) operate across:
- many regions
- multiple data centers
- hundreds of nodes
These
systems must make latency vs consistency decisions all the time, even
without any failures.
CAP says
nothing about this.
PACELC
explains:
- Why some systems are always
fast (latency-first)
- Why some systems always enforce strong consistency (consistency-first)
Concrete
Example
Cassandra
/ DynamoDB (PA/EL)
- If partition → pick Availability
- Else (normal) → pick Low
Latency
→ gives fast,
eventually consistent writes