Overview
The backup and restore operations:- Create snapshots of the etcd database
- Store snapshots locally or in S3-compatible storage
- Restore clusters from previous snapshots
- Support both on-demand and scheduled backups
Cluster Actions
Rancher exposes two etcd backup actions (defined in/home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/cluster_types.go:33-34):
ClusterActionBackupEtcd- Create an on-demand etcd backupClusterActionRestoreFromEtcdBackup- Restore from an existing backup
- The Rancher UI cluster management page
- The Rancher API at
/v3/clusters/{clusterId}?action=backupEtcd - The Rancher API at
/v3/clusters/{clusterId}?action=restoreFromEtcdBackup
Creating Backups
On-Demand Backup
Create an immediate etcd snapshot:curl -X POST \
'https://{RANCHER_URL}/v3/clusters/{CLUSTER_ID}?action=backupEtcd' \
-H 'Authorization: Bearer {TOKEN}' \
-H 'Content-Type: application/json'
apiVersion: management.cattle.io/v3
kind: EtcdBackup
metadata:
name: manual-backup-{DATE}
namespace: {CLUSTER_NAME}
spec:
clusterName: {CLUSTER_NAME}
filename: {BACKUP_NAME}
Scheduled Backups
Configure automatic backup schedules:- Edit cluster configuration
- Navigate to Advanced Options → etcd
- Configure backup settings:
- Backup Interval: Hours between automatic backups
- Retention: Number of backups to retain
- S3 Backup Target: Optional remote storage
- Save the configuration
Backup Storage
Local Storage
By default, backups are stored on etcd nodes at:- Are stored on each etcd node
- Persist across node reboots
- Are not protected against node failures
- Should be supplemented with remote backups
S3-Compatible Storage
Configure S3 backup storage for redundancy:- Provide off-cluster redundancy
- Support any S3-compatible storage (AWS S3, MinIO, etc.)
- Require credentials stored in cluster secrets
- Enable backup restoration to new clusters
Backup Resource
TheEtcdBackup type is defined in /home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/rke_types.go:13:
- Backup filename and location
- Creation timestamp
- Completion state
- Error messages (if failed)
Listing Backups
View available backups:Restoring from Backup
Restore Prerequisites
- A valid etcd backup
- Cluster-admin access to Rancher
- Understanding that current cluster state will be replaced
- Confirmation from all stakeholders
Restore Procedure
The restore input type is defined in
/home/daytona/workspace/source/pkg/apis/management.cattle.io/v3/cluster_types.go:367:type RestoreFromEtcdBackupInput struct {
EtcdBackupName string // Reference to EtcdBackup resource
RestoreRkeConfig string // Optional RKE config to apply
}
curl -X POST \
'https://{RANCHER_URL}/v3/clusters/{CLUSTER_ID}?action=restoreFromEtcdBackup' \
-H 'Authorization: Bearer {TOKEN}' \
-H 'Content-Type: application/json' \
-d '{
"etcdBackupName": "{CLUSTER_NAME}:{BACKUP_NAME}",
"restoreRkeConfig": ""
}'
Restore Configuration
TherestoreRkeConfig parameter allows applying RKE configuration changes during restore:
- Update Kubernetes version
- Modify cluster networking
- Change service configurations
- Update node pool settings
Backup Best Practices
Frequency
- Production clusters: Every 6-12 hours
- Development clusters: Every 24 hours
- Critical clusters: Every 1-6 hours + before major changes
Retention
- Keep at least 7 days of backups
- Retain backups before major upgrades indefinitely
- Balance retention with storage costs
Storage
- Always configure S3 backup storage for production
- Test backup restoration regularly
- Verify backups are being created successfully
- Monitor backup storage capacity
Testing
- Test restore procedures in non-production environments
- Validate backup integrity periodically
- Document restore procedures for your team
- Practice restore operations before emergencies
Backup Scope
Etcd backups include:- All Kubernetes resources (Pods, Services, ConfigMaps, etc.)
- Cluster configuration and state
- RBAC policies and bindings
- Custom Resource Definitions (CRDs)
- Secrets and sensitive data
- PersistentVolume data (requires separate backup)
- Application-level data in databases
- Container images
- Logs and metrics data
Troubleshooting
Backup Fails to Create
- Insufficient disk space: Check etcd node disk usage
- S3 credentials: Verify S3 access key and secret are correct
- Network issues: Ensure connectivity to S3 endpoint
- Permissions: Verify etcd has write access to backup directory
Restore Fails
- Backup corruption: Verify backup file integrity
- Version mismatch: Ensure backup is compatible with cluster version
- Insufficient resources: Check node capacity during restore
- Network interruption: Restore may fail if connection is lost
Cluster Not Healthy After Restore
- Wait 5-10 minutes for all components to stabilize
- Check etcd pod status:
kubectl get pods -n kube-system | grep etcd - Verify API server connectivity
- Review cluster events for errors