Backup and Restore

Rancher provides built-in support for backing up and restoring etcd data in RKE (Rancher Kubernetes Engine) clusters. Regular etcd backups ensure you can recover from data loss or cluster failures.

Overview

The backup and restore operations:

Create snapshots of the etcd database
Store snapshots locally or in S3-compatible storage
Restore clusters from previous snapshots
Support both on-demand and scheduled backups

Backup and restore operations are only available for RKE and RKE2 clusters. Imported clusters and hosted Kubernetes services (EKS, GKE, AKS) must use their native backup solutions.

Cluster Actions

Rancher exposes two etcd backup actions (defined in /home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/cluster_types.go:33-34):

ClusterActionBackupEtcd - Create an on-demand etcd backup
ClusterActionRestoreFromEtcdBackup - Restore from an existing backup

These actions are available through:

The Rancher UI cluster management page
The Rancher API at /v3/clusters/{clusterId}?action=backupEtcd
The Rancher API at /v3/clusters/{clusterId}?action=restoreFromEtcdBackup

Creating Backups

On-Demand Backup

Create an immediate etcd snapshot:

Via Rancher UI

Navigate to Cluster Management

Select your RKE cluster

Click the ⋮ menu

Select Take Snapshot

Optionally provide a snapshot name

Click Save

Via Rancher API

Trigger a backup using the API:

curl -X POST \
  'https://{RANCHER_URL}/v3/clusters/{CLUSTER_ID}?action=backupEtcd' \
  -H 'Authorization: Bearer {TOKEN}' \
  -H 'Content-Type: application/json'

Via kubectl

Create an EtcdBackup resource:

apiVersion: management.cattle.io/v3
kind: EtcdBackup
metadata:
  name: manual-backup-{DATE}
  namespace: {CLUSTER_NAME}
spec:
  clusterName: {CLUSTER_NAME}
  filename: {BACKUP_NAME}

Apply with:

kubectl apply -f backup.yaml

Scheduled Backups

Configure automatic backup schedules:

Edit cluster configuration
Navigate to Advanced Options → etcd
Configure backup settings:
- Backup Interval: Hours between automatic backups
- Retention: Number of backups to retain
- S3 Backup Target: Optional remote storage
Save the configuration

Rancher automatically creates backups according to the schedule.

Backup Storage

Local Storage

By default, backups are stored on etcd nodes at:

/opt/rke/etcd-snapshots/

Local backups:

Are stored on each etcd node
Persist across node reboots
Are not protected against node failures
Should be supplemented with remote backups

S3-Compatible Storage

Configure S3 backup storage for redundancy:

rancherKubernetesEngineConfig:
  services:
    etcd:
      backup:
        enabled: true
        intervalHours: 12
        retention: 6
        s3BackupConfig:
          bucketName: {BUCKET}
          region: {REGION}
          endpoint: {S3_ENDPOINT}
          accessKey: {ACCESS_KEY}
          secretKey: {SECRET_KEY}

S3 backups:

Provide off-cluster redundancy
Support any S3-compatible storage (AWS S3, MinIO, etc.)
Require credentials stored in cluster secrets
Enable backup restoration to new clusters

Backup Resource

The EtcdBackup type is defined in /home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/rke_types.go:13:

type EtcdBackup struct {
    metav1.TypeMeta
    metav1.ObjectMeta
    Spec   rketypes.EtcdBackupSpec     // Backup configuration
    Status rketypes.EtcdBackupStatus   // Backup state
}

Backup status includes:

Backup filename and location
Creation timestamp
Completion state
Error messages (if failed)

Listing Backups

View available backups:

kubectl get etcdbackups -n {CLUSTER_NAME}

Example output:

NAME                        CLUSTERNAME   CREATED
manual-backup-2024-01-15    c-xxxxx       2024-01-15T10:00:00Z
auto-backup-2024-01-15      c-xxxxx       2024-01-15T14:00:00Z

Restoring from Backup

Restoring from backup is a destructive operation that replaces all etcd data. All cluster state changes since the backup was created will be lost.

Restore Prerequisites

A valid etcd backup
Cluster-admin access to Rancher
Understanding that current cluster state will be replaced
Confirmation from all stakeholders

Restore Procedure

Via Rancher UI

Navigate to Cluster Management

Select the cluster to restore

Click the ⋮ menu

Select Restore from Snapshot

Choose the backup to restore from

Optionally provide RKE configuration to apply during restore

Confirm the operation

Click Restore

Via Rancher API

The restore input type is defined in /home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/cluster_types.go:367:

type RestoreFromEtcdBackupInput struct {
    EtcdBackupName   string  // Reference to EtcdBackup resource
    RestoreRkeConfig string  // Optional RKE config to apply
}

Trigger a restore:

curl -X POST \
  'https://{RANCHER_URL}/v3/clusters/{CLUSTER_ID}?action=restoreFromEtcdBackup' \
  -H 'Authorization: Bearer {TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "etcdBackupName": "{CLUSTER_NAME}:{BACKUP_NAME}",
    "restoreRkeConfig": ""
  }'

Restore Process

Validation: Rancher validates the backup exists and is accessible

Cluster Pause: Cluster operations are paused during restore

Etcd Stop: etcd processes are stopped on all nodes

Data Restore: Backup data is restored to etcd nodes

Etcd Restart: etcd is restarted with restored data

Cluster Sync: Kubernetes API server reconnects to etcd

Validation: Rancher verifies cluster health

Post-Restore Verification

Wait for the cluster to return to Active state

Verify critical workloads are running:

kubectl get pods --all-namespaces

Check cluster events for errors:

kubectl get events --sort-by='.lastTimestamp'

Validate application data integrity

Restore any resources created after the backup

Restore Configuration

The restoreRkeConfig parameter allows applying RKE configuration changes during restore:

Update Kubernetes version
Modify cluster networking
Change service configurations
Update node pool settings

Leave empty to restore without configuration changes.

Backup Best Practices

Frequency

Production clusters: Every 6-12 hours
Development clusters: Every 24 hours
Critical clusters: Every 1-6 hours + before major changes

Retention

Keep at least 7 days of backups
Retain backups before major upgrades indefinitely
Balance retention with storage costs

Storage

Always configure S3 backup storage for production
Test backup restoration regularly
Verify backups are being created successfully
Monitor backup storage capacity

Testing

Test restore procedures in non-production environments
Validate backup integrity periodically
Document restore procedures for your team
Practice restore operations before emergencies

Backup Scope

Etcd backups include:

All Kubernetes resources (Pods, Services, ConfigMaps, etc.)
Cluster configuration and state
RBAC policies and bindings
Custom Resource Definitions (CRDs)
Secrets and sensitive data

Backups do not include:

PersistentVolume data (requires separate backup)
Application-level data in databases
Container images
Logs and metrics data

Troubleshooting

Backup Fails to Create

Insufficient disk space: Check etcd node disk usage
S3 credentials: Verify S3 access key and secret are correct
Network issues: Ensure connectivity to S3 endpoint
Permissions: Verify etcd has write access to backup directory

Check backup status:

kubectl describe etcdbackup {BACKUP_NAME} -n {CLUSTER_NAME}

Restore Fails

Backup corruption: Verify backup file integrity
Version mismatch: Ensure backup is compatible with cluster version
Insufficient resources: Check node capacity during restore
Network interruption: Restore may fail if connection is lost

View restore logs:

kubectl logs -n cattle-system -l app=cluster-agent

Cluster Not Healthy After Restore

Wait 5-10 minutes for all components to stabilize
Check etcd pod status: kubectl get pods -n kube-system | grep etcd
Verify API server connectivity
Review cluster events for errors

​Overview

​Cluster Actions

​Creating Backups

​On-Demand Backup

​Scheduled Backups

​Backup Storage

​Local Storage

​S3-Compatible Storage

​Backup Resource

​Listing Backups

​Restoring from Backup

​Restore Prerequisites

​Restore Procedure

​Restore Configuration

​Backup Best Practices

​Frequency

​Retention

​Storage

​Testing

​Backup Scope

​Troubleshooting

​Backup Fails to Create

​Restore Fails

​Cluster Not Healthy After Restore

​Related Resources

Overview

Cluster Actions

Creating Backups

On-Demand Backup

Scheduled Backups

Backup Storage

Local Storage

S3-Compatible Storage

Backup Resource

Listing Backups

Restoring from Backup

Restore Prerequisites

Restore Procedure

Restore Configuration

Backup Best Practices

Frequency

Retention

Storage

Testing

Backup Scope

Troubleshooting

Backup Fails to Create

Restore Fails

Cluster Not Healthy After Restore

Related Resources