Should I use Layer 4 or Layer 7 load balancing?

L4 (transport layer, TCP/UDP) is faster, simpler, and supports any protocol. L7 (application layer, HTTP) understands the protocol and can route based on URL, headers, cookies. For HTTP-based microservices, L7 is almost always the right choice — the routing flexibility and observability outweigh the slight performance cost. For non-HTTP protocols (gaming servers, raw TCP services, databases), L4 is appropriate. AWS ELB family: NLB is L4, ALB is L7.

What's the best load balancing algorithm?

There's no universal best. Round-robin works for most stateless workloads with similar request costs. Least-connections is better for workloads with varying request durations. IP hash is needed when sticky sessions are required without cookies. Weighted variants help with heterogeneous backend capacity. Power-of-two-choices is increasingly popular for its near-optimal performance with minimal coordination. Test under realistic traffic patterns; theoretically optimal often loses to operationally simple.

Where do you put load balancers in a microservices architecture?

Multiple layers. (1) Edge load balancer (e.g., AWS ALB) handles incoming HTTPS traffic. (2) Service mesh (Istio, Linkerd) handles inter-service load balancing with intelligent routing. (3) Each service may have its own load balancer for upstream dependencies (database read replicas, cache clusters). The trend is toward client-side or service-mesh load balancing for inter-service communication, leaving traditional appliance load balancers for the edge.

Load Balancers Deep Dive: L4 vs L7, Algorithms, and Real Trade-offs

Every modern internet service uses load balancers. They're often treated as a black box — "set it up once, it just works." But subtle choices in load balancer configuration produce dramatically different reliability, latency, and capacity outcomes. Misconfigured load balancers cause more production incidents than almost any other infrastructure component.

This post covers what load balancers actually do at L4 and L7, the major algorithms and when each is appropriate, and the configurations that matter most in production.

What a load balancer does

At its core, a load balancer:

Accepts incoming connections/requests
Selects a backend server based on configured algorithm
Forwards the request to that backend
Returns the response to the client

Modern load balancers add:

Health checks: detect unhealthy backends and stop sending traffic
Connection pooling: reuse connections to backends
TLS termination: decrypt incoming HTTPS so backends serve plain HTTP
Compression, caching, rate limiting
Routing based on URL/headers (L7 only)

Layer 4 vs Layer 7

Layer 4 (Transport)

Operates on TCP/UDP packets. Doesn't inspect payload — just forwards based on IP+port.

Characteristics:

Very fast: minimal CPU per packet
Protocol-agnostic: works for HTTP, gRPC, raw TCP, database protocols
Limited routing: can only balance based on connection-level info

Examples: AWS NLB, HAProxy in TCP mode, NGINX stream module, F5 BIG-IP LTM.

Use cases:

Database load balancing (Postgres read replicas)
Game servers
Raw TCP services (Kafka brokers, Redis clusters)
High-throughput HTTPS where TLS isn't terminated at LB

Layer 7 (Application)

Operates on application-layer requests (HTTP, gRPC). Parses the request, can route based on URL, headers, body.

Characteristics:

More CPU intensive (must parse requests)
Protocol-aware: can apply HTTP-specific features (compression, caching, header manipulation)
Sophisticated routing: by hostname, path, cookies, headers, query params

Examples: AWS ALB, NGINX (default mode), HAProxy in HTTP mode, Envoy, Traefik.

Use cases:

HTTP API gateways
Microservice routing by path
A/B testing with header-based routing
Sticky sessions via cookies
Per-request observability

For most modern HTTP-based applications, L7 is the right choice.

Load balancing algorithms

Round-robin

Each request goes to the next backend in rotation. Simple, fair under uniform conditions.

Pros: Simple, predictable, no state. Cons: Doesn't account for varying request costs or backend capacity. Use when: Stateless services with uniform request costs.

Weighted round-robin

Backends have weights; receive proportional traffic share. Useful when backends have different capacities.

Use when: Heterogeneous backend pool (e.g., gradual rollout to a new larger instance type).

Least connections

Routes to the backend with the fewest active connections. Adapts to varying request durations.

Pros: Naturally balances under variable workload. Cons: Connection count isn't the same as load (long polls, websockets). Use when: Request durations vary significantly (some short, some long).

Least response time

Routes to the backend with the fastest recent response time. Adapts to backend health and load.

Pros: Routes around slow backends automatically. Cons: Requires response-time tracking, adds complexity. Use when: Backend performance varies; you have monitoring infrastructure.

IP hash / consistent hash

Hash of client IP determines backend. Same client always hits the same backend (rough sticky sessions).

Pros: Simple sticky sessions without cookies; cache locality benefits. Cons: Uneven distribution (one IP can become hot); rebalancing on backend changes. Use when: Need stickiness without cookie-based routing (e.g., for legacy WebSockets).

Power of two choices

Randomly pick 2 backends, route to whichever has fewer connections. Surprisingly close to optimal with minimal coordination.

Pros: Near-optimal performance, easy to implement, scales well. Cons: Requires connection-count tracking. Use when: Distributed load balancing where state synchronization is hard.

This is increasingly the default in modern service meshes.

Maglev / consistent hashing with bounded loads

Variants of consistent hashing that maintain even load while preserving cache locality. Used in Google's load balancers.

Use when: Cache co-location matters and you can't afford full re-hashing on backend changes.

Health checks: the critical configuration

Health checks determine which backends receive traffic. Misconfigured health checks cause cascading failures.

Active health checks

Load balancer periodically sends a request to each backend; if it fails, mark unhealthy.

Configuration knobs:

Endpoint: dedicated /health endpoint, NOT the main API (which may have side effects)
Interval: 5-30 seconds typical
Timeout: 1-5 seconds typical
Threshold: how many failures before marking unhealthy (typically 2-3)
Recovery threshold: how many successes before re-marking healthy

Passive health checks

Monitor real traffic; if too many requests fail to a backend, mark unhealthy.

Often combined with active checks for robust detection.

Common health check mistakes

Health endpoint that checks downstream dependencies (cascading failure)
Too aggressive thresholds (instances flapping in and out)
Too slow detection (unhealthy instances serve errors for minutes)
Health checks that don't reflect actual service health (e.g., always returns 200)

TLS termination

Most production load balancers terminate TLS — they decrypt incoming HTTPS, forward plain HTTP to backends.

Pros:

Centralized cert management
Backends don't need TLS overhead
Easier debugging (plain HTTP traffic visible)

Cons:

Internal traffic is unencrypted (security concern)
Load balancer becomes critical for cert management

End-to-end TLS (encrypted from client all the way to backend) is increasingly common in zero-trust architectures, at the cost of backend CPU and operational complexity.

Sticky sessions

Some applications require subsequent requests from the same user to hit the same backend. Mechanisms:

Cookie-based: LB sets a cookie identifying the backend; subsequent requests routed there
Header-based: route based on a request header
IP-based: route based on source IP (unreliable behind NAT)

The right answer: avoid sticky sessions where possible. They limit horizontal scaling, complicate failover, and create hot backends. Use shared session storage (Redis, distributed cache) instead.

Common production configurations

Single ELB / ALB

Simplest setup. AWS ALB or NLB in front of all your backends. Sufficient for most applications under 10K RPS.

Multi-region with DNS

Route 53 routes traffic to regional load balancers based on user location. Each region has its own LB and backend pool.

Service mesh (Istio, Linkerd)

Each service has a sidecar proxy that handles its own load balancing. No central appliance — load balancing is distributed across the mesh. Modern microservice approach.

Edge + service mesh hybrid

Edge load balancer (ALB) handles ingress; service mesh handles inter-service communication. Most common modern pattern.

Common mistakes

No health checks at all: traffic continues to dead backends until manual intervention
Health check endpoint that hits the database: every health check is a database query
Single load balancer in single AZ: load balancer goes down with the AZ
No connection draining: deploys/scaling kill in-flight requests
Over-aggressive timeouts: legitimate slow requests get killed
No rate limiting: a misbehaving client takes down all backends
Ignoring backend capacity in algorithm choice: round-robin to backends with 2x different capacity wastes 50% of the larger ones

Observability

Critical metrics to monitor:

Request rate per backend
Error rate per backend
Latency percentiles (p50, p95, p99) per backend
Connection count per backend
Health check success rate
Connection draining duration

Most modern load balancers (ALB, Envoy, NGINX Plus) emit these metrics natively to Prometheus, CloudWatch, etc.

What to read next

System design basics — the broader context.
Microservices observability — instrumenting LB-routed services.
API gateway patterns — gateways are often LBs with extra logic.
Database sharding explained — sharding behind LBs.

Load balancers are deceptively simple in concept and surprisingly subtle in practice. The right choice of L4 vs L7, algorithm, health check configuration, and observability stack determines whether your service has 99.9% or 99.99% availability. Treat load balancer configuration as a first-class engineering decision, not a "set and forget" infrastructure detail.