What is a Load Balancer and How Does It Work?

A load balancer is a system that distributes incoming network traffic across multiple servers to optimize resource utilization, maximize throughput, and minimize response time. Without load balancing, a single server would handle all requests, creating bottlenecks and reducing reliability.

Why Load Balancers Matter in Cloud Architecture

Modern applications handle millions of concurrent users. A single server simply can't process that volume reliably. When Netflix streams video to 200+ million subscribers simultaneously, or when Amazon processes Black Friday traffic spikes, load balancers are working behind the scenes to keep everything running smoothly.

Load balancers solve three critical problems:

Availability: If one server fails, traffic automatically routes to healthy servers, preventing service outages.
Performance: Requests distribute evenly across servers, preventing any single machine from becoming overwhelmed.
Scalability: You can add or remove servers dynamically without disrupting service.

Think of a load balancer like a smart receptionist directing customers to available cashiers based on queue length. If one cashier gets swamped, new customers go elsewhere.

How Load Balancers Work

The basic process is straightforward: a client sends a request to a load balancer's public IP address. The load balancer receives the request, consults its algorithm to select a backend server, and forwards the request to that server. The server responds to the load balancer, which passes the response back to the client. The client never communicates directly with backend servers—all traffic flows through the load balancer.

Here's a typical flow:

Client Request (example.com)
       ↓
   Load Balancer (checks health & applies algorithm)
       ↓
   ┌───────────────────────┐
   │  Backend Servers      │
   ├───────────────────────┤
   │  Server 1 (40% load)  │
   │  Server 2 (30% load)  │
   │  Server 3 (20% load)  │
   │  Server 4 (DOWN)      │
   └───────────────────────┘
       ↓
Server Response → Load Balancer → Client

Most load balancers operate at Layer 4 (Transport Layer) or Layer 7 (Application Layer) of the OSI model. Layer 4 load balancers make decisions based on IP protocol data (like TCP/UDP ports), while Layer 7 load balancers can inspect HTTP headers, cookies, and request content for more sophisticated routing.

Load Balancing Algorithms

Load balancers use different strategies to decide which server receives each request:

Round Robin

Requests distribute sequentially across servers. Server 1 gets request 1, Server 2 gets request 2, Server 3 gets request 3, then back to Server 1. It's simple but doesn't account for server capacity differences.

Least Connections

New requests go to the server with the fewest active connections. This works well for applications where some requests take longer than others, ensuring no server gets overloaded with long-running tasks.

Least Response Time

The load balancer sends requests to the server with the lowest average response time combined with fewest active connections. This produces optimal user experience as requests reach the fastest available server.

IP Hash

The client's IP address determines which server handles all its requests. This ensures session persistence—the same client always routes to the same server. Useful when session data isn't shared across servers.

Weighted Round Robin

Servers receive different traffic percentages based on assigned weights. A powerful server might get 40% of traffic while a less capable server gets 20%. Administrators manually assign weights based on hardware specs.

# Example NGINX load balancing configuration
upstream backend {
    server server1.example.com weight=5;
    server server2.example.com weight=3;
    server server3.example.com weight=2;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Types of Load Balancers

Hardware Load Balancers

Physical devices like F5 BIG-IP or Citrix NetScaler sit between internet and servers. They're expensive ($10,000–$100,000+) but handle massive traffic volumes and offer advanced features. Large enterprises with mission-critical applications typically use these.

Software Load Balancers

Applications running on standard servers. NGINX, HAProxy, and Apache are popular open-source options. They're cost-effective and flexible but require more configuration and management. Most startups and mid-size companies use software load balancers.

Cloud Load Balancers

Services provided by cloud platforms—AWS Elastic Load Balancing (ELB), Google Cloud Load Balancing, Azure Load Balancer. They scale automatically, integrate seamlessly with cloud infrastructure, and you pay only for what you use. This is the modern standard for cloud-native applications.

Health Checks and Failover

Load balancers continuously monitor backend servers' health. They send periodic health check requests (usually HTTP GET requests to a specific endpoint) to each server. If a server doesn't respond or returns an error status code, the load balancer marks it as unhealthy and stops routing traffic to it.

When a server recovers, health checks succeed again and traffic gradually resumes. This automatic failover happens without user awareness—requests seamlessly redirect to functioning servers.

Health check configuration typically includes:

Interval: How often to check (every 5-10 seconds)
Timeout: How long to wait for response (usually 3-5 seconds)
Healthy threshold: Consecutive successful checks before marking server as healthy (often 2-3)
Unhealthy threshold: Consecutive failed checks before marking server as down (often 2-3)

Session Persistence and Sticky Sessions

Some applications store session data locally on individual servers. A user logs in on Server 1, their session lives there. If the next request routes to Server 2, the session doesn't exist and the user gets logged out.

Load balancers solve this with sticky sessions (session persistence). Once a client connects to a server, all subsequent requests from that client route to the same server for the duration of their session. Methods include:

Cookie-based: Load balancer inserts a cookie indicating the preferred server
Source IP-based: Client IP determines server selection consistently
Application-based: Your app passes session identifiers the load balancer respects

Modern applications typically avoid sticky sessions by storing sessions in Redis or databases accessible by all servers. This enables true stateless server architecture and better resilience.

Load Balancing in Practice: AWS Example

AWS provides Application Load Balancer (ALB) for Layer 7 routing and Network Load Balancer (NLB) for extreme performance. Here's how you'd create a basic ALB:

# AWS CLI example - create ALB
aws elbv2 create-load-balancer \
    --name my-app-lb \
    --subnets subnet-12345 subnet-67890 \
    --security-groups sg-12345 \
    --scheme internet-facing \
    --type application

# Create target group
aws elbv2 create-target-group \
    --name my-app-targets \
    --protocol HTTP \
    --port 80 \
    --vpc-id vpc-12345

# Register instances with target group
aws elbv2 register-targets \
    --target-group-arn arn:aws:elasticloadbalancing:... \
    --targets Id=i-12345 Id=i-67890 Id=i-abcde

# Configure health checks
aws elbv2 modify-target-group \
    --target-group-arn arn:aws:elasticloadbalancing:... \
    --health-check-path /health \
    --health-check-interval-seconds 30

Behind the scenes, AWS automatically distributes your traffic across availability zones, scales capacity, and patches security vulnerabilities. You just pay for what you use.

Common Load Balancing Challenges

Connection draining: When removing a server, you can't instantly disconnect active requests. Connection draining waits for existing connections to complete (with a timeout limit) before fully removing the server from rotation.

SSL/TLS termination: Decrypting HTTPS traffic is computationally expensive. Most load balancers handle SSL/TLS decryption, then communicate with backend servers over faster HTTP. This is called SSL offloading.

Distributed session state: Managing sessions across multiple servers requires external storage. Redis, Memcached, or databases become single points of failure if not properly replicated.

Cost: Cloud load balancers charge per hour plus data processing fees. A high-traffic application might spend $500+ monthly on load balancing alone.

Load Balancer vs. Auto Scaling

People sometimes confuse these complementary tools. Load balancers distribute traffic across existing servers. Auto scaling automatically adds or removes servers based on demand. Together, they create resilient, scalable systems. Load balancers provide the distribution; auto scaling provides the flexibility.

When traffic to your website spikes, auto scaling might launch 10 new instances. The load balancer immediately discovers these new servers and starts sending them traffic. When traffic drops, auto scaling removes instances and the load balancer stops routing to them.

Frequently Asked Questions

Can a load balancer itself fail?

Yes, which is why critical systems use multiple load balancers. They sit behind a virtual IP address that automatically fails over to a backup load balancer if the primary becomes unavailable. This is called High Availability (HA) load balancing. Cloud services like AWS ELB handle this automatically.

What's the difference between load balancing and reverse proxy?

A reverse proxy forwards client requests to servers and returns responses, sitting between clients and your infrastructure. Load balancers are specialized reverse proxies that distribute traffic optimally across multiple servers. All load balancers are reverse proxies, but not all reverse proxies are load balancers.

Do I need a load balancer for my small application?

Probably not initially. A single server handles thousands of requests per second. As you grow, add load balancing when you need redundancy, better performance, or when one server can't handle your traffic. Many applications don't need load balancers until they reach significant scale.

How do load balancers handle WebSocket connections?

WebSockets require persistent connections, making traditional load balancing tricky. Modern load balancers use sticky sessions or IP-based affinity to keep WebSocket connections on the same server. Some advanced load balancers can intelligently manage WebSocket upgrade requests and maintain the persistent connection path.

Conclusion

Load balancers are foundational infrastructure components that make modern, scalable applications possible. They eliminate single points of failure, distribute traffic intelligently, and enable adding capacity without service interruption. Whether you're running a startup on cloud platforms or managing enterprise infrastructure, understanding load balancing helps you build systems that stay online and respond quickly regardless of demand.

Start with cloud-provided load balancers to avoid operational complexity, then optimize as your scale and requirements evolve.