Skip to main content
For production environments, Rancher should be installed in a high-availability (HA) configuration on a Kubernetes cluster. This ensures continuous availability of the Rancher management server and provides resilience against node failures.

Architecture Overview

A high-availability Rancher deployment consists of:
  • Multiple Rancher replicas - Default 3 replicas running across different nodes
  • Kubernetes scheduling - Automatic pod distribution and recovery
  • etcd integration - State stored in the Kubernetes cluster’s etcd database
  • Load balancer - External load balancer distributing traffic to Rancher pods
  • Ingress controller - Routes HTTP/HTTPS traffic to Rancher services
                    ┌─────────────────┐
                    │  Load Balancer  │
                    └────────┬────────┘

                    ┌────────┴────────┐
                    │ Ingress Controller│
                    └────────┬────────┘

        ┏━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┓
        ┃                                          ┃
   ┌────▼─────┐        ┌──────────┐        ┌──────▼───┐
   │ Rancher  │        │ Rancher  │        │ Rancher  │
   │  Pod 1   │        │  Pod 2   │        │  Pod 3   │
   └────┬─────┘        └────┬─────┘        └────┬─────┘
        │                   │                   │
        └───────────────────┴───────────────────┘

                    ┌────────▼────────┐
                    │  etcd cluster   │
                    └─────────────────┘

Prerequisites

Supported Kubernetes Clusters

For Rancher Support SLA coverage, use one of these Kubernetes distributions:
  • RKE1 - Rancher Kubernetes Engine 1
  • RKE2 - Rancher Kubernetes Engine 2
  • K3s - Lightweight Kubernetes
  • AKS - Azure Kubernetes Service
  • EKS - Amazon Elastic Kubernetes Service
  • GKE - Google Kubernetes Engine

Infrastructure Requirements

Hardware Requirements

Minimum for small deployments (up to 100 clusters):
  • 4 vCPUs
  • 8 GB RAM
  • 50 GB disk space
Recommended for production:
  • 8 vCPUs per node
  • 16 GB RAM per node
  • 100 GB disk space per node
  • 3 or more worker nodes

Network Requirements

  • Load balancer - Layer 4 or Layer 7 load balancer
  • DNS - Fully qualified domain name pointing to the load balancer
  • Ports:
    • 80/TCP (HTTP)
    • 443/TCP (HTTPS)
    • 6443/TCP (Kubernetes API, if using RKE2/K3s)

High Availability Installation

1
Prepare the Kubernetes Cluster
2
Ensure you have a Kubernetes cluster with at least 3 worker nodes for optimal availability:
3
kubectl get nodes
4
Verify nodes are in Ready state:
5
NAME       STATUS   ROLES           AGE   VERSION
node-1     Ready    control-plane   10d   v1.28.0
node-2     Ready    worker          10d   v1.28.0
node-3     Ready    worker          10d   v1.28.0
6
Configure Load Balancer
7
Set up a load balancer that forwards traffic to your Kubernetes cluster:
8
Layer 4 Load Balancer (TCP):
9
  • Forward port 443 to the Kubernetes ingress controller (typically NodePort or LoadBalancer service)
  • Forward port 80 to the Kubernetes ingress controller (optional, for HTTP redirect)
  • 10
    Layer 7 Load Balancer (HTTP/HTTPS):
    11
  • Configure SSL termination at the load balancer (optional)
  • Forward traffic to the Kubernetes ingress controller
  • 12
    Install Rancher with HA Configuration
    13
    Install Rancher with 3 replicas (default) using Helm:
    14
    helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
    kubectl create namespace cattle-system
    
    helm install rancher rancher-latest/rancher \
      --namespace cattle-system \
      --set hostname=rancher.example.com \
      --set replicas=3
    
    15
    Configure Anti-Affinity Rules
    16
    Ensure Rancher pods are distributed across different nodes using anti-affinity rules:
    17
    helm install rancher rancher-latest/rancher \
      --namespace cattle-system \
      --set hostname=rancher.example.com \
      --set replicas=3 \
      --set antiAffinity=required
    
    18
    Anti-affinity options:
    19
  • preferred (default) - Prefers different nodes but allows same-node scheduling if needed
  • required - Enforces different nodes, may leave pods pending if insufficient nodes
  • 20
    Verify HA Deployment
    21
    Check that Rancher pods are distributed across nodes:
    22
    kubectl -n cattle-system get pods -l app=rancher -o wide
    
    23
    Expected output showing pods on different nodes:
    24
    NAME                       READY   STATUS    NODE
    rancher-7d56c8c9f-4xqpz    1/1     Running   node-1
    rancher-7d56c8c9f-8hjkl    1/1     Running   node-2
    rancher-7d56c8c9f-mwxyz    1/1     Running   node-3
    
    25
    Test Failover
    26
    Test the HA configuration by simulating a node failure:
    27
    # Cordon a node to prevent new pods
    kubectl cordon node-1
    
    # Delete a Rancher pod on that node
    kubectl -n cattle-system delete pod <pod-name>
    
    # Verify a new pod is scheduled on a different node
    kubectl -n cattle-system get pods -l app=rancher -o wide
    

    Advanced HA Configuration

    Multi-Node etcd Configuration

    Rancher integrates with the Kubernetes cluster’s etcd database. For maximum availability:
    • RKE1/RKE2/K3s: Use at least 3 control-plane nodes with embedded etcd
    • Managed Kubernetes (AKS/EKS/GKE): etcd is managed by the cloud provider
    • External etcd: Configure a separate etcd cluster for enhanced isolation

    Resource Allocation

    Configure resource requests and limits to ensure consistent performance:
    helm install rancher rancher-latest/rancher \
      --namespace cattle-system \
      --set hostname=rancher.example.com \
      --set replicas=3 \
      --set resources.requests.cpu=2000m \
      --set resources.requests.memory=4Gi \
      --set resources.limits.cpu=4000m \
      --set resources.limits.memory=8Gi
    

    Priority Class

    Rancher uses a priorityClassName to prevent eviction during resource pressure:
    --set priorityClassName=rancher-critical
    
    This is set by default and ensures Rancher pods are prioritized over workload pods.

    Dynamic Replica Scaling

    Configure dynamic replica scaling based on available nodes:
    --set replicas=-3
    
    Setting a negative value dynamically scales replicas between 0 and the absolute value based on available nodes.

    Load Balancing Strategies

    Layer 4 Load Balancing

    Advantages:
    • Simple configuration
    • Low latency
    • SSL termination at Rancher
    Configuration:
    Backend Pool: Kubernetes worker nodes
    Protocol: TCP
    Ports: 443, 80
    Health Check: TCP 443
    

    Layer 7 Load Balancing

    Advantages:
    • SSL termination at load balancer
    • Advanced routing capabilities
    • Web Application Firewall (WAF) integration
    Configuration:
    Backend Pool: Kubernetes ingress controller
    Protocol: HTTPS
    Ports: 443
    Health Check: HTTPS GET /healthz
    SSL Certificate: Managed at load balancer
    
    When using SSL termination at the load balancer:
    helm install rancher rancher-latest/rancher \
      --namespace cattle-system \
      --set hostname=rancher.example.com \
      --set tls=external
    

    Monitoring and Maintenance

    Health Checks

    Rancher includes built-in health probes: Liveness Probe:
    livenessProbe:
      timeoutSeconds: 5
      periodSeconds: 30
      failureThreshold: 5
    
    Readiness Probe:
    readinessProbe:
      timeoutSeconds: 5
      periodSeconds: 30
      failureThreshold: 5
    

    Monitoring Endpoints

    Monitor Rancher availability:
    # Check pod status
    kubectl -n cattle-system get pods -l app=rancher
    
    # Check service endpoints
    kubectl -n cattle-system get endpoints rancher
    
    # Check ingress status
    kubectl -n cattle-system get ingress
    

    Backup and Disaster Recovery

    Regularly back up the Rancher cluster to prevent data loss.
    Backup strategy:
    • etcd snapshots (automated in RKE1/RKE2/K3s)
    • Kubernetes resource manifests
    • Rancher database state

    Troubleshooting HA Deployments

    Pods Not Distributing Across Nodes

    Check anti-affinity rules:
    kubectl -n cattle-system get deployment rancher -o yaml | grep -A 10 affinity
    

    Load Balancer Health Check Failures

    Verify ingress controller is running:
    kubectl get pods -n ingress-nginx
    
    Check service endpoints:
    kubectl -n cattle-system describe service rancher
    

    Split-Brain Scenarios

    If experiencing network partitions:
    1. Check etcd cluster health
    2. Verify Kubernetes cluster connectivity
    3. Review load balancer logs
    4. Check for DNS resolution issues

    Best Practices

    • Use at least 3 replicas for production deployments
    • Enable anti-affinity rules to distribute pods across nodes
    • Configure resource requests and limits for predictable performance
    • Implement monitoring and alerting for Rancher pod health
    • Regularly backup etcd and Rancher state
    • Use managed Kubernetes (AKS/EKS/GKE) for simplified operations
    • Test failover scenarios regularly
    • Keep Rancher and Kubernetes versions up to date

    Next Steps