Skip to main content
Rancher provides built-in support for backing up and restoring etcd data in RKE (Rancher Kubernetes Engine) clusters. Regular etcd backups ensure you can recover from data loss or cluster failures.

Overview

The backup and restore operations:
  • Create snapshots of the etcd database
  • Store snapshots locally or in S3-compatible storage
  • Restore clusters from previous snapshots
  • Support both on-demand and scheduled backups
Backup and restore operations are only available for RKE and RKE2 clusters. Imported clusters and hosted Kubernetes services (EKS, GKE, AKS) must use their native backup solutions.

Cluster Actions

Rancher exposes two etcd backup actions (defined in /home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/cluster_types.go:33-34):
  • ClusterActionBackupEtcd - Create an on-demand etcd backup
  • ClusterActionRestoreFromEtcdBackup - Restore from an existing backup
These actions are available through:
  • The Rancher UI cluster management page
  • The Rancher API at /v3/clusters/{clusterId}?action=backupEtcd
  • The Rancher API at /v3/clusters/{clusterId}?action=restoreFromEtcdBackup

Creating Backups

On-Demand Backup

Create an immediate etcd snapshot:
1
Via Rancher UI
2
  • Navigate to Cluster Management
  • Select your RKE cluster
  • Click the menu
  • Select Take Snapshot
  • Optionally provide a snapshot name
  • Click Save
  • 3
    Via Rancher API
    4
    Trigger a backup using the API:
    5
    curl -X POST \
      'https://{RANCHER_URL}/v3/clusters/{CLUSTER_ID}?action=backupEtcd' \
      -H 'Authorization: Bearer {TOKEN}' \
      -H 'Content-Type: application/json'
    
    6
    Via kubectl
    7
    Create an EtcdBackup resource:
    8
    apiVersion: management.cattle.io/v3
    kind: EtcdBackup
    metadata:
      name: manual-backup-{DATE}
      namespace: {CLUSTER_NAME}
    spec:
      clusterName: {CLUSTER_NAME}
      filename: {BACKUP_NAME}
    
    9
    Apply with:
    10
    kubectl apply -f backup.yaml
    

    Scheduled Backups

    Configure automatic backup schedules:
    1. Edit cluster configuration
    2. Navigate to Advanced Optionsetcd
    3. Configure backup settings:
      • Backup Interval: Hours between automatic backups
      • Retention: Number of backups to retain
      • S3 Backup Target: Optional remote storage
    4. Save the configuration
    Rancher automatically creates backups according to the schedule.

    Backup Storage

    Local Storage

    By default, backups are stored on etcd nodes at:
    /opt/rke/etcd-snapshots/
    
    Local backups:
    • Are stored on each etcd node
    • Persist across node reboots
    • Are not protected against node failures
    • Should be supplemented with remote backups

    S3-Compatible Storage

    Configure S3 backup storage for redundancy:
    rancherKubernetesEngineConfig:
      services:
        etcd:
          backup:
            enabled: true
            intervalHours: 12
            retention: 6
            s3BackupConfig:
              bucketName: {BUCKET}
              region: {REGION}
              endpoint: {S3_ENDPOINT}
              accessKey: {ACCESS_KEY}
              secretKey: {SECRET_KEY}
    
    S3 backups:
    • Provide off-cluster redundancy
    • Support any S3-compatible storage (AWS S3, MinIO, etc.)
    • Require credentials stored in cluster secrets
    • Enable backup restoration to new clusters

    Backup Resource

    The EtcdBackup type is defined in /home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/rke_types.go:13:
    type EtcdBackup struct {
        metav1.TypeMeta
        metav1.ObjectMeta
        Spec   rketypes.EtcdBackupSpec     // Backup configuration
        Status rketypes.EtcdBackupStatus   // Backup state
    }
    
    Backup status includes:
    • Backup filename and location
    • Creation timestamp
    • Completion state
    • Error messages (if failed)

    Listing Backups

    View available backups:
    kubectl get etcdbackups -n {CLUSTER_NAME}
    
    Example output:
    NAME                        CLUSTERNAME   CREATED
    manual-backup-2024-01-15    c-xxxxx       2024-01-15T10:00:00Z
    auto-backup-2024-01-15      c-xxxxx       2024-01-15T14:00:00Z
    

    Restoring from Backup

    Restoring from backup is a destructive operation that replaces all etcd data. All cluster state changes since the backup was created will be lost.

    Restore Prerequisites

    • A valid etcd backup
    • Cluster-admin access to Rancher
    • Understanding that current cluster state will be replaced
    • Confirmation from all stakeholders

    Restore Procedure

    1
    Via Rancher UI
    2
  • Navigate to Cluster Management
  • Select the cluster to restore
  • Click the menu
  • Select Restore from Snapshot
  • Choose the backup to restore from
  • Optionally provide RKE configuration to apply during restore
  • Confirm the operation
  • Click Restore
  • 3
    Via Rancher API
    4
    The restore input type is defined in /home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/cluster_types.go:367:
    5
    type RestoreFromEtcdBackupInput struct {
        EtcdBackupName   string  // Reference to EtcdBackup resource
        RestoreRkeConfig string  // Optional RKE config to apply
    }
    
    6
    Trigger a restore:
    7
    curl -X POST \
      'https://{RANCHER_URL}/v3/clusters/{CLUSTER_ID}?action=restoreFromEtcdBackup' \
      -H 'Authorization: Bearer {TOKEN}' \
      -H 'Content-Type: application/json' \
      -d '{
        "etcdBackupName": "{CLUSTER_NAME}:{BACKUP_NAME}",
        "restoreRkeConfig": ""
      }'
    
    8
    Restore Process
    9
  • Validation: Rancher validates the backup exists and is accessible
  • Cluster Pause: Cluster operations are paused during restore
  • Etcd Stop: etcd processes are stopped on all nodes
  • Data Restore: Backup data is restored to etcd nodes
  • Etcd Restart: etcd is restarted with restored data
  • Cluster Sync: Kubernetes API server reconnects to etcd
  • Validation: Rancher verifies cluster health
  • 10
    Post-Restore Verification
    11
  • Wait for the cluster to return to Active state
  • Verify critical workloads are running:
    kubectl get pods --all-namespaces
    
  • Check cluster events for errors:
    kubectl get events --sort-by='.lastTimestamp'
    
  • Validate application data integrity
  • Restore any resources created after the backup
  • Restore Configuration

    The restoreRkeConfig parameter allows applying RKE configuration changes during restore:
    • Update Kubernetes version
    • Modify cluster networking
    • Change service configurations
    • Update node pool settings
    Leave empty to restore without configuration changes.

    Backup Best Practices

    Frequency

    • Production clusters: Every 6-12 hours
    • Development clusters: Every 24 hours
    • Critical clusters: Every 1-6 hours + before major changes

    Retention

    • Keep at least 7 days of backups
    • Retain backups before major upgrades indefinitely
    • Balance retention with storage costs

    Storage

    • Always configure S3 backup storage for production
    • Test backup restoration regularly
    • Verify backups are being created successfully
    • Monitor backup storage capacity

    Testing

    • Test restore procedures in non-production environments
    • Validate backup integrity periodically
    • Document restore procedures for your team
    • Practice restore operations before emergencies

    Backup Scope

    Etcd backups include:
    • All Kubernetes resources (Pods, Services, ConfigMaps, etc.)
    • Cluster configuration and state
    • RBAC policies and bindings
    • Custom Resource Definitions (CRDs)
    • Secrets and sensitive data
    Backups do not include:
    • PersistentVolume data (requires separate backup)
    • Application-level data in databases
    • Container images
    • Logs and metrics data

    Troubleshooting

    Backup Fails to Create

    • Insufficient disk space: Check etcd node disk usage
    • S3 credentials: Verify S3 access key and secret are correct
    • Network issues: Ensure connectivity to S3 endpoint
    • Permissions: Verify etcd has write access to backup directory
    Check backup status:
    kubectl describe etcdbackup {BACKUP_NAME} -n {CLUSTER_NAME}
    

    Restore Fails

    • Backup corruption: Verify backup file integrity
    • Version mismatch: Ensure backup is compatible with cluster version
    • Insufficient resources: Check node capacity during restore
    • Network interruption: Restore may fail if connection is lost
    View restore logs:
    kubectl logs -n cattle-system -l app=cluster-agent
    

    Cluster Not Healthy After Restore

    • Wait 5-10 minutes for all components to stabilize
    • Check etcd pod status: kubectl get pods -n kube-system | grep etcd
    • Verify API server connectivity
    • Review cluster events for errors