- Published on
Load Balancers – Introduction
- Authors

- Name
- Bartek Zabek
Load Balancers – Introduction
A Load Balancer (LB) distributes incoming traffic across multiple servers. This helps systems handle more requests, improves availability, and prevents individual instances from being overloaded.
In this article, we’ll briefly walk through the most important aspects to consider when designing load balancers. Detailed articles are comming!
How it works
Load Balancer Use Cases
1. Global Load Balancing
Traffic is distributed across regions. Key concepts include:
- DNS-based Global LB
- Anycast
Many modern solutions use these principles.
2. Application-Level Load Balancing
Examples: Nginx, HAProxy in HTTP mode, AWS ALB
3. Internal Services
Examples: Route 53, Cloudflare DNS
OSI Layers for Load Balancing
Load balancers can operate at different layers of the OSI model:
- Network Layer (L3)
- Transport Layer (L4)
- Application Layer (L7)
Load Balancing Algorithms
1️⃣ Round Robin
- How it works: Requests are distributed sequentially across backends in a loop.
- Pros: Simple, evenly distributes load for similar instances.
- Cons: Does not account for server load or capacity.
- Use case: Suitable for small systems with similar machines.
2️⃣ Weighted Round Robin (WRR)
- How it works: Each backend gets a weight → higher weight = more requests.
- Pros: Can balance servers with different capacities.
- Cons: Still does not consider current load.
3️⃣ Least Connections / Connection-Based
- How it works: Requests go to the server with the fewest active connections.
- Pros: Good for long sessions or uneven request sizes.
- Cons: Requires tracking active connections.
4️⃣ Random
- How it works: Requests are randomly sent to a backend.
- Pros: Simple, works in large backend pools.
- Cons: May cause uneven load distribution in short intervals.
5️⃣ Hash-Based / Consistent Hashing
- How it works: Requests are routed based on a hash (e.g., client IP, session ID).
- Pros:
- Sticky sessions without LB storing state
- Stable when backend count changes (used in cache/CDN)
- Cons: Uneven hash distribution can cause hotspots.
- Example Use: Apache Cassandra
6️⃣ Least Response Time / Weighted Least Response Time
- How it works: Requests go to the backend with the lowest average response time (optionally weighted).
- Pros: Dynamically adapts to backend load.
- Cons: Requires backend performance monitoring.
Handling Requests Across Multiple Servers
In distributed systems, user requests are often routed through a load balancer to multiple backend servers. This creates a challenge when applications rely on local session state—for example, shopping carts in e-commerce systems. If a user’s request is routed to a different server, their session data may be lost.
To solve this, several strategies are commonly used:
Sticky Session
The LB generates/maps a session ID to a specific backend.
Stateless Backend + Token / JWT
Session is carried in a token → any LB can handle the request.
Shared Session Store
All backends access a common session store (e.g., Redis, Memcached).
Consistent Hashing
Instead of mapping every session, the LB redirects requests based on a computed hash → less state in the LB.
Health Checks
Load balancers monitor the health of backend servers and route traffic away from unhealthy ones. If multiple load balancer instances are used, they should also monitor each other.
Scaling Load Balancers
Horizontal Scaling
Adding new backend instances horizontally.
- Active-Passive: One LB is active, others are on standby.
- Active-Active: All LBs actively distribute traffic.
Global Scaling
In geo-distributed systems, a global LB (using Anycast/DNS) directs traffic to different regions. Each region can scale its local LB horizontally.
Summary
As you can see, there are many things to think about. We’ll go through each of them in the upcoming articles.
Thanks for reading!