Skip to content

Step 2: Load Balancing HTTP

Duration: ~30 minutes Goal: Run multiple backend instances, configure Envoy to load-balance between them, and understand how health checking works.


What you will learn

  • How to define multiple endpoints in a static cluster
  • Load balancing policies and when to use each
  • Passive vs active health checking
  • How Envoy detects and routes around failed backends

Concepts

Load balancing policies

When a cluster has multiple endpoints, Envoy selects one using a load balancing policy (lb_policy):

Policy Best for
ROUND_ROBIN Uniform, short-lived requests
LEAST_REQUEST Variable-duration requests (gRPC, long polls)
RANDOM When you want to avoid thundering herd on restart
RING_HASH Session affinity (same client → same backend)
MAGLEV Session affinity with better distribution than ring hash

For HTTP services with roughly equal request durations, ROUND_ROBIN is fine. For gRPC (where requests vary widely in duration), prefer LEAST_REQUEST.

Passive vs active health checking

Envoy has two complementary mechanisms for detecting unhealthy backends:

Passive health checking (outlier detection — covered in Step 4): No probing. Envoy watches real traffic and marks endpoints unhealthy when it sees errors. Zero overhead, but there's always at least one failed request before detection.

Active health checking: Envoy sends periodic probe requests to each endpoint. Detects failures before real traffic hits them, but adds probe overhead. You configure thresholds:

  • unhealthy_threshold — how many consecutive failures to declare unhealthy
  • healthy_threshold — how many consecutive successes to declare healthy again
  • interval — how often to probe
  • timeout — how long to wait for a health check response before counting it as a failure

Use both together: active health checking for proactive detection, outlier detection for automatic ejection under real traffic.

The endpoint hierarchy

The load_assignment section has three levels of nesting that serve distinct purposes:

load_assignment:
  cluster_name: backend_cluster
  endpoints:                    # list of LocalityLbEndpoints
    - locality:                 # optional: geographic zone (region/zone/subzone)
        region: us-east-1
      lb_endpoints:             # list of endpoints in this locality
        - endpoint:
            address:
              socket_address: { address: backend1, port_value: 8080 }
        - endpoint:
            address:
              socket_address: { address: backend2, port_value: 8080 }
  • endpoints (a list of LocalityLbEndpoints) — groups endpoints by locality. Locality is used for zone-aware routing and priority-based failover (Step 4).
  • lb_endpoints — the individual backends within a locality. Envoy load-balances across these.
  • locality — optional geographic metadata. When omitted (as in this step), all endpoints are treated as one undifferentiated group.

For simple configs, you will see just one endpoints entry with no locality, containing all your lb_endpoints.


Setup

mkdir -p envoy-tutorial/step2 && cd envoy-tutorial/step2

Backend: backend.py

A minimal HTTP server that identifies itself by name:

#!/usr/bin/env python3
import http.server, os, socket

class Handler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        name = os.environ.get("SERVER_NAME", socket.gethostname())
        body = f"Hello from {name}\n".encode()
        self.send_response(200)
        self.send_header("Content-Length", len(body))
        self.end_headers()
        self.wfile.write(body)
    def log_message(self, *args):
        pass  # silence default request logs

http.server.HTTPServer(("0.0.0.0", 8080), Handler).serve_forever()

Compose: docker-compose.yaml

services:
  backend1:
    image: python:3.12-slim
    volumes:
      - ./backend.py:/backend.py
    command: python /backend.py
    environment:
      SERVER_NAME: backend-1

  backend2:
    image: python:3.12-slim
    volumes:
      - ./backend.py:/backend.py
    command: python /backend.py
    environment:
      SERVER_NAME: backend-2

  backend3:
    image: python:3.12-slim
    volumes:
      - ./backend.py:/backend.py
    command: python /backend.py
    environment:
      SERVER_NAME: backend-3

  envoy:
    image: envoyproxy/envoy:v1.31-latest
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
    ports:
      - "10000:10000"
      - "9901:9901"
    command: envoy -c /etc/envoy/envoy.yaml
    depends_on:
      - backend1
      - backend2
      - backend3

Envoy config: envoy.yaml

static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address: { address: 0.0.0.0, port_value: 10000 }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: backends
                      domains: ["*"]
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: backend_cluster
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: backend_cluster
      connect_timeout: 5s
      type: STRICT_DNS  # (1)
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: backend_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: backend1, port_value: 8080 }
              - endpoint:
                  address:
                    socket_address: { address: backend2, port_value: 8080 }
              - endpoint:
                  address:
                    socket_address: { address: backend3, port_value: 8080 }
      health_checks:  # (2)
        - timeout: 2s
          interval: 5s
          unhealthy_threshold: 2
          healthy_threshold: 1
          http_health_check:
            path: "/"

admin:
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }
  1. STRICT_DNS resolves each hostname and uses all returned IP addresses as endpoints. In Docker Compose, each service name resolves to exactly one IP, so this behaves like LOGICAL_DNS here — but it's the right type for service names that may resolve to multiple IPs (e.g. a Kubernetes headless service).
  2. Active health checks probe each backend every 5 seconds using an HTTP GET to /. After 2 consecutive non-2xx responses or connection failures the backend is marked unhealthy (health_flags: failed_active_hc in the admin API). After 1 success it recovers. Setting healthy_threshold: 1 means recovery is immediate on the first successful probe.

Run it

docker compose up

Exercises

1. See round-robin in action

for i in $(seq 1 9); do curl -s http://localhost:10000/; done

You should see the three backends cycling in order:

Hello from backend-1
Hello from backend-2
Hello from backend-3
Hello from backend-1
...

2. Simulate a backend failure

In a second terminal, stop one backend:

docker compose stop backend2

Immediately send requests — you may see a brief error as Envoy detects the failure, then it routes around it:

for i in $(seq 1 6); do curl -s http://localhost:10000/; done

Watch the health status update (check every few seconds):

curl -s http://localhost:9901/clusters | grep -E "(backend|health_flags|cx_connect)"

Look for ::failed_active_hc on backend2 after 2 failed health check cycles.

3. Recover the backend

docker compose start backend2

Within 5–10 seconds (one health check interval) Envoy marks it healthy again and resumes sending traffic to it.

4. Switch to LEAST_REQUEST

Change lb_policy: ROUND_ROBIN to lb_policy: LEAST_REQUEST in envoy.yaml. Restart only Envoy (not the backends):

docker compose restart envoy

The distribution will look similar for this simple case. The difference becomes visible when some requests take much longer than others — LEAST_REQUEST avoids piling more work onto an already-busy backend.

5. Read the cluster stats

# Total requests per endpoint
curl -s http://localhost:9901/stats | grep "backend_cluster.*rq_total"

# Connection failures
curl -s http://localhost:9901/stats | grep "backend_cluster.*connect_fail"

# Health check results
curl -s http://localhost:9901/stats | grep "health_check"

What you learned

  • Defining multiple static endpoints in a cluster
  • Load balancing policies: ROUND_ROBIN vs LEAST_REQUEST
  • Active health checking with thresholds and intervals
  • How Envoy routes around failed backends
  • Reading per-endpoint stats from the admin API

Next step

In Step 3 you will switch to a real gRPC service and learn what makes gRPC proxying different from HTTP/1.1.