In system design, Scalability is the capacity of a system to adapt its performance and cost to the new changes in application and system processing demands.
The architecture used to build services, networks, and processes is
scalable under these 2 conditions:
- Add
resources easily when demand/workload increases.
- Remove
resource easily when demand /workload decreases.
Now
scalability is achieved via two methods in systems:
- Vertical
scaling
- Horizontal
scaling
By adding
more configuration or hardware for better computing or storage, vertical
scaling expands the scale of a system. In actuality, this would include
upgrading the processors, raising the RAM, or making other power-increasing
changes. Multi-core scaling is used in this case to scale by distributing the
load among the CPU and RAM resources.
Pros Of
Scaling Up Vertically
- It
uses less energy than maintaining multiple servers.
- Requires
less administrative work because only one machine must be managed.
- Has
lower cooling costs.
- Lower
software costs.
- Simpler
to implement.
- Preserves application compatibility.
Cons Of
Scaling Up Vertically
- There
is a high chance of hardware failure, which could result in more serious
issues.
- There
is little room for system upgrades, it may become a single point of
failure (SPOF)
- There
is a limit to how much RAM.
- Memory
storage can be added to a machine at once.
Through the act of adding new machines, a system can be scaled horizontally. Several devices must be gathered and connected in order to handle more system requests.
Pros Of
Scaling Up Horizontal
- It
is less expensive than scaling up and makes use of smaller systems.
- Simple
to upgrade.
- The
existence of discrete, multiple systems improves resilience.
- Fault
tolerance is simple to manage.
- The capacity is increased by supporting linear.
Cons Of
Scaling Up Horizontal
- The
license costs are higher.
- It
has a larger footprint inside the data center which increases the cost of
utilities like cooling and energy.
- It necessitates more networking hardware.
Vertical
Scaling vs. Horizontal Scaling
Now that
we have looked into the details of each type of scaling, let us compare them
with respect to different parameters:
Parameter |
Horizontal
Scaling |
Vertical
Scaling |
Database |
Partitioning
of data. |
Data
resides on a single machine and scaling is done across multicores henceforth
the load is divided between CPU and RAM. |
Downtime |
Adding
machines in a pooled results in lesser downtime. |
Calling
over a single machine increases downtime. |
Data
Sharing |
As there
is distributed network structure so data sharing via message passing becomes
quite complex |
Working
over a single machine enables message passing making data sharing very
easier. |
Example/s |
SQL |
Cassandra,
MongoDB |
How
to avoid failure during Scalability?
As studied
above with the concept of scalability we can while designing the architect of a
system we cannot opt to design on extreme sides that are either overusing (more
number of resources) the resources or underusing (lesser number of resources)
the resources per the requirements gathered and analyzed.
Now there
is a catch here even if we can design a perpetual perfect system then too there
arises a failure (as discussed above in Architect Principle Rules
for Designing). Failures do exist for sure as mentioned above in the
best-designed system, but we can prevent them from hampering our system
globally. This is because we keep our system redundant, and our data
replicated so that it is retained.
Let us now
understand these terms in greater depth which are as follows:
- Redundancy
- Replication
Redundancy
is nothing more than the duplication of nodes or components so that, in the
event of a node or component failure, the backup node can continue to provide
services to consumers. In order to sustain availability, failure recovery, or
failure management, redundancy is helpful. The goal of redundancy is to create
quick, effective, and accessible backup channels.
It is of
two types:
- Active
redundancy
- Standby
or Passive redundancy
Replication is the administration of various data storage in which each component is kept in numerous copies hosted on different servers. It is simply the copying of data between many devices. It involves synchronizing various machines. Replication contributes to increased fault tolerance and reliability by ensuring consistency amongst redundant resources.
Also, it
is of two types:
- Active
replication
- Passive
replication
Scalability
Design Principles
Whenever a
system is designed, the following principles should be kept in mind to tackle
scalability issues:
- Scalability
vs Performance: While
building a Scalable system, the performance of the system should be always
directly proportional to its scalability. It means that when the system is
scaled up, the performance should enhance, and when the performance
requirements are low, the system should be scaled down.
- Asynchronous
Communication: The
should be always asynchronous communication between various components of
the systems, to avoid any failure.
- Concurrency: It is the same concept just
likely programming, here in the system if our controller needs multiple
queries to send to the user then they are launched concurrently which
drastically cuts(reduces) the response time.
- Databases: If the Queries are fired one
after another, the overall latency should not increase and the database
should not start sweating.
- Eventual
Consistency: Eventual
consistency is a consistency model used in distributed computing to
achieve high availability that informally guarantees that, if no new
updates are made to a given data item, eventually all accesses to that
item will return the last updated value.
- Denormalization: 3rd normal form depicts
computations are more expensive than HDD space which just not only bind us
to electricity but also to higher latency.
- Caching: It acts as an important pillar
in getting cache hits/misses and LRU cache.
- Failures: Everything in a system
can never be kept under control as failures occur when we are making the
system perform threshold. But failures do occurs. Here we practice using
this method to isolate issues and prevent them from spreading
globally.
- Monitoring: Some bugs do occur while in
the reproduction phase which is the worst phase as here we are not having
adequate evidence behind occurrences of logic so with help of monitoring
we indirectly are constantly retrospecting the incidents.
- Capacity
balancing: Suppose
the load increases tremendously and we receive 1000 requests
earlier being managed by 20 workers with an average
request time say it be 100ms.
Now here, circuit breakout = 1000/20 = 500ms. It means 900 requests will fail. That’s why sometimes we adjust circuit breaker settings in order to balance capacities in the real world. - Servers: Small-capacity servers are
good for curves with smooth capacity whereas big servers are action for
heavy computations invoking monitoring and latency, load balancing.
- Deployment: An older code should
always be present and maintained for all massive and irreversible changes
that result in downtime. If not possible then break the change apart but
these practices must be followed while deploying when system architecture
is scaling.
How
to handle SPOF during Scalability?
In order
to make efficient systems scalable from replication parameter where there are
multiple copies stored over servers handling SPOF well, here we need to
learn 2 concepts listed below that aids us in achieving efficient
scalable systems globally even across huge distributed system architect.
- Load
Balancing
- Caching
Now let us
cover load balancing to a greater order degree of depth followed by caching to
completely understand scalability to higher order degree which is as follows:
What is
Load Balancing?
Load
balancing It is a technique of effectively distributing application or network
traffic among all nodes in a distributed system. Load balancers are
the tools used to ensure load balancing.
Load
Balancer roles
- Each
node receives an equal share of the workload.
- Should
keep track of which nodes are unavailable or not in use.
- Effectively
manage/distribute work to ensure that it is finished on time.
- Distribution
should be done to maximize speed and use all available capacity.
- Load
balancers must guarantee high scaling, high throughput, and high
availability.
Let us
also make it clear what should be the ideal conditions under which we can use a
load balancer. They are as follows:
- Load
balancers can be used for load management when the application has several
instances or servers.
- Application
traffic is split amongst several servers or nodes.
- Load balancers are crucial for maintaining scalability, availability, and latency in a heavy-traffic environment.
Benefits
of Load Balancing
- Optimization: In a heavy traffic
environment, load balancers help to better utilize resources and reduce
response times, which optimizes the system.
- Improved
User Experience: Load
balancers assist in lowering latency and raising availability, resulting
in a smooth and error-free user request.
- Prevents
Downtime: By
keeping track of servers that aren’t working and allocating traffic
properly, load balancers provide security and avoid downtime, which also
boosts revenue and productivity.
- Flexibility: To ensure efficiency,
load balancers can reroute traffic in the event of a failure and work on
server maintenance.
- Scalability: Load balancers can use real or virtual servers to deliver responses without any interruption when a web application’s traffic suddenly surges.
Challenges
to Load Balancing
As we
already have discussed a constraint of SPOF while developing systems so the
same is incorporated here out. Load balancer failure or
breakdown may cause the entire system to be suspended and unavailable for a
while, which will negatively affect user experience. Client and server
communication would be disrupted in the event of a load balancer malfunction.
We can employ redundancy to resolve this problem. Both an active and a passive
load balancer may be present in the system. The passive load balancer can take
over as the active load balancer if the active load balancer fails.
For better
understanding, we will dive to Load Balancing algorithms that are as follows:
Load
Balancing Algorithms
For the
effective distribution of load over various nodes or servers, various
algorithms can be used. Depending on the kind of application the load balancer
must be utilized for, the algorithm should be chosen.
A few
load-balancing algorithms are listed below:
- Round
Robin Algorithm
- Weighted
Round Robin Algorithm
- IP
Hash Algorithm
- Least
Connection Algorithm
- Least Response Time
What is
Caching?
A cache is a portion of data that is
generally temporarily cached by a high-speed data storage layer allowing
subsequent requests for that data to be fulfilled more quickly than if the data
were accessed directly from its original storage location.
Caching is a process by which we can
reuse data that has already been swiftly accessed or computed by producing a
local instance of the static data, caching reduces the number of read calls,
API calls, and network I/O calls.
Types
Of Cache:
There are
basically three types of caches as follows:
- Local cache: In memory when the cache must
be kept in the local memory, it is used for a single system. It is also
known as L1 cache.
- Example: Memcache and Google
Guava Cache
- External cache: Within multiple systems
also known as a distributed cache. It is also known as L2 cache.
- When the cache must be shared
by several systems, it is employed. As a result, we store the cache in a
distributed manner that all servers may access.
- Example: Redis
- Specialized cache: It is a special type of
memory that is developed for improving the performance of the above local
and external cache. It is also known as L3 cache.
How
does Caching work?
The
information stored in a cache is typically saved in hardware that provides
quick access, such as RAM (random-access memory), but it can also be used by a
software component. The main objective is to increase data retrieval
performance by avoiding contact with the slower storage layer below.
Note: Applications of caching are:
- CDN
(Content Delivery Network)
- Application
Server Cache
Benefits
of Caching
- Improves
performance of the application
- Lower
database expenses
- Lessen
the Backend’s Load
- Dependable
Results
- Get
rid of hotspots in databases
- Boost
read-through rate (IOPS)
Disadvantages
of Caching
- Cache
memory is costly and has a finite amount of space.
- The
page becomes hefty as information is stored in the cache.
- Sometimes
updated information is not displayed as the cache is not updated.
Application
of Caching
- Caching
could help reduce latency and increase IOPS for many read-intensive
application workloads, including gaming, media sharing, social networking,
and Q&A portals.
- Examples
of cached data include database searches, computationally challenging
calculations, API calls, and responses, and web artifacts like HTML,
JavaScript, and image files.
- Compute-intensive
applications that change data sets, such as recommendation engines and
simulations for high-performance computing, benefit from an in-memory data
layer acting as a cache.
- In
these applications, massive data sets must be retrieved in real-time
across clusters of servers that can include hundreds of nodes. Due to the
speed of the underlying hardware, many programs are severely constrained
in their ability to manipulate this data in a disk-based store.
Remember: When
and where to use caching?
Case1: Static
Data: If the data is not changing too regularly, caching would be
beneficial. We can save the data and use it right away. Caching wouldn’t do
much good if the data was changing quickly.
Case2: Application
type: Applications can either be read-intensive or write-intensive.
The application that requires a lot of reading would benefit more from caching.
Data would change quickly for a write-intensive application, hence caching
shouldn’t be used.
Lastly,
let us discuss caching strategies to wrap up the concept of caching:
Caching
Strategies
Caching
patterns are what designers use to include a cache into a system. Write-through
and cache-aside are two common techniques:
- write-around
- write-through
- write-back
Cache
Eviction Strategies
The
eviction policy of the cache determines the order in which items are removed
from a full cache. It’s done to clear some room so that more entries can be
added. These policies are listed below as follows:
- LRU(Least
Recently Used)
- LFU(Least
Frequent Used)
- FIFO(First
In First Out)
- LIFO(Last
in First Out)
- MRU
No comments:
Post a Comment