Step 2: Load Balancing HTTP¶
Duration: ~30 minutes Goal: Run multiple backend instances, configure Envoy to load-balance between them, and understand how health checking works.
What you will learn¶
- How to define multiple endpoints in a static cluster
- Load balancing policies and when to use each
- Passive vs active health checking
- How Envoy detects and routes around failed backends
Concepts¶
Load balancing policies¶
When a cluster has multiple endpoints, Envoy selects one using a load balancing policy (lb_policy):
| Policy | Best for |
|---|---|
ROUND_ROBIN |
Uniform, short-lived requests |
LEAST_REQUEST |
Variable-duration requests (gRPC, long polls) |
RANDOM |
When you want to avoid thundering herd on restart |
RING_HASH |
Session affinity (same client → same backend) |
MAGLEV |
Session affinity with better distribution than ring hash |
For HTTP services with roughly equal request durations, ROUND_ROBIN is fine. For gRPC (where requests vary widely in duration), prefer LEAST_REQUEST.
Passive vs active health checking¶
Envoy has two complementary mechanisms for detecting unhealthy backends:
Passive health checking (outlier detection — covered in Step 4): No probing. Envoy watches real traffic and marks endpoints unhealthy when it sees errors. Zero overhead, but there's always at least one failed request before detection.
Active health checking: Envoy sends periodic probe requests to each endpoint. Detects failures before real traffic hits them, but adds probe overhead. You configure thresholds:
unhealthy_threshold— how many consecutive failures to declare unhealthyhealthy_threshold— how many consecutive successes to declare healthy againinterval— how often to probetimeout— how long to wait for a health check response before counting it as a failure
Use both together: active health checking for proactive detection, outlier detection for automatic ejection under real traffic.
The endpoint hierarchy¶
The load_assignment section has three levels of nesting that serve distinct purposes:
load_assignment:
cluster_name: backend_cluster
endpoints: # list of LocalityLbEndpoints
- locality: # optional: geographic zone (region/zone/subzone)
region: us-east-1
lb_endpoints: # list of endpoints in this locality
- endpoint:
address:
socket_address: { address: backend1, port_value: 8080 }
- endpoint:
address:
socket_address: { address: backend2, port_value: 8080 }
endpoints(a list ofLocalityLbEndpoints) — groups endpoints by locality. Locality is used for zone-aware routing and priority-based failover (Step 4).lb_endpoints— the individual backends within a locality. Envoy load-balances across these.locality— optional geographic metadata. When omitted (as in this step), all endpoints are treated as one undifferentiated group.
For simple configs, you will see just one endpoints entry with no locality, containing all your lb_endpoints.
Setup¶
Backend: backend.py¶
A minimal HTTP server that identifies itself by name:
#!/usr/bin/env python3
import http.server, os, socket
class Handler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
name = os.environ.get("SERVER_NAME", socket.gethostname())
body = f"Hello from {name}\n".encode()
self.send_response(200)
self.send_header("Content-Length", len(body))
self.end_headers()
self.wfile.write(body)
def log_message(self, *args):
pass # silence default request logs
http.server.HTTPServer(("0.0.0.0", 8080), Handler).serve_forever()
Compose: docker-compose.yaml¶
services:
backend1:
image: python:3.12-slim
volumes:
- ./backend.py:/backend.py
command: python /backend.py
environment:
SERVER_NAME: backend-1
backend2:
image: python:3.12-slim
volumes:
- ./backend.py:/backend.py
command: python /backend.py
environment:
SERVER_NAME: backend-2
backend3:
image: python:3.12-slim
volumes:
- ./backend.py:/backend.py
command: python /backend.py
environment:
SERVER_NAME: backend-3
envoy:
image: envoyproxy/envoy:v1.31-latest
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
ports:
- "10000:10000"
- "9901:9901"
command: envoy -c /etc/envoy/envoy.yaml
depends_on:
- backend1
- backend2
- backend3
Envoy config: envoy.yaml¶
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 10000 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
route_config:
name: local_route
virtual_hosts:
- name: backends
domains: ["*"]
routes:
- match: { prefix: "/" }
route:
cluster: backend_cluster
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: backend_cluster
connect_timeout: 5s
type: STRICT_DNS # (1)
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: backend_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: backend1, port_value: 8080 }
- endpoint:
address:
socket_address: { address: backend2, port_value: 8080 }
- endpoint:
address:
socket_address: { address: backend3, port_value: 8080 }
health_checks: # (2)
- timeout: 2s
interval: 5s
unhealthy_threshold: 2
healthy_threshold: 1
http_health_check:
path: "/"
admin:
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
STRICT_DNSresolves each hostname and uses all returned IP addresses as endpoints. In Docker Compose, each service name resolves to exactly one IP, so this behaves likeLOGICAL_DNShere — but it's the right type for service names that may resolve to multiple IPs (e.g. a Kubernetes headless service).- Active health checks probe each backend every 5 seconds using an HTTP GET to
/. After 2 consecutive non-2xx responses or connection failures the backend is marked unhealthy (health_flags: failed_active_hcin the admin API). After 1 success it recovers. Settinghealthy_threshold: 1means recovery is immediate on the first successful probe.
Run it¶
Exercises¶
1. See round-robin in action¶
You should see the three backends cycling in order:
2. Simulate a backend failure¶
In a second terminal, stop one backend:
Immediately send requests — you may see a brief error as Envoy detects the failure, then it routes around it:
Watch the health status update (check every few seconds):
Look for ::failed_active_hc on backend2 after 2 failed health check cycles.
3. Recover the backend¶
Within 5–10 seconds (one health check interval) Envoy marks it healthy again and resumes sending traffic to it.
4. Switch to LEAST_REQUEST¶
Change lb_policy: ROUND_ROBIN to lb_policy: LEAST_REQUEST in envoy.yaml. Restart only Envoy (not the backends):
The distribution will look similar for this simple case. The difference becomes visible when some requests take much longer than others — LEAST_REQUEST avoids piling more work onto an already-busy backend.
5. Read the cluster stats¶
# Total requests per endpoint
curl -s http://localhost:9901/stats | grep "backend_cluster.*rq_total"
# Connection failures
curl -s http://localhost:9901/stats | grep "backend_cluster.*connect_fail"
# Health check results
curl -s http://localhost:9901/stats | grep "health_check"
What you learned¶
- Defining multiple static endpoints in a cluster
- Load balancing policies:
ROUND_ROBINvsLEAST_REQUEST - Active health checking with thresholds and intervals
- How Envoy routes around failed backends
- Reading per-endpoint stats from the admin API
Next step
In Step 3 you will switch to a real gRPC service and learn what makes gRPC proxying different from HTTP/1.1.