How to Backup and Restore Kubernetes etcd Storage Simplified
Source: www.pixabay.com
Introduction
In Kubernetes, etcd serves as the fundamental storage unit for preserving the state of the cluster. It's a distributed, reliable, secure, and fast key-value store that backs up all cluster data. This includes the configuration details of resources such pods, deployments, services, and replicasets, as well as the current state of the system, such as the nodes that are part of the cluster and the running pods. Given its role in storing the entire state of the cluster, etcd is a crucial component of a Kubernetes setup. If etcd data is lost, the Kubernetes cluster may lose data and might require recovery or recreation. With etcd backups, the cluster can be restored to a known state, which can minimize downtime and ensure business continuity.
Pre-requisite:
- Basic knowledge of Container technology and kubernetes are required
- Proficient in the use of command line tools i.e. Bash terminal
- Access to a Kubernetes cluster if you want to experiment with the example in this tutorial
Getting started
In this tutorial, I will guide you through the process of backing up and restoring Kubernetes etcd storage, demonstrating the process in a very simple way . Let’s get started by creating a deployment using any image of your choice. I will be using nginx image.
$ k create deployment deploy1 --image=nginx
deployment.apps/deploy1 created
Let’s view the deployment
$ k get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
deploy1 1/1 1 0 6s
Backup the current state of the cluster
Firstly, if you're not already on the control-plane node, SSH into it before performing the etcd backup. Then, execute k get nodes
to ensure all nodes are in the Ready
state. Then Identify the version of etcd running in the cluster, and locate the endpoint URL for the etcd service. Also, determine the locations of the private key, client or server, and CA certificates, as these are necessary for authentication; without them, the backup operation will fail. Finally, before creating an etcd backup, make sure the etcdctl-client utility tool is installed. If it's not, you can install it using the command apt update && apt install etcd-client
.
The following command syntax can be use to create a snapshot of the etcd storage:
ETCDCTL_API=3 etcdctl --endpoints=https:/<IP Address>:<Port> --cacert=<trusted-ca-file> --cert=<cert-file> --key=<key-file> snapshot save <backup-file-location>
Here's an explanation of the command:
-
ETCDCTL_API=3
: This sets the version of the etcdctl client to use. Version 3 is the latest and is required for the snapshot save command. -
etcdctl
: This is the command line client for etcd installed earlier if not already available in the cluster. -
--endpoints=https:/<IP Address>:<Port>
: This specifies the address of the etcd server. To get the address of the etcd server in a Kubernetes cluster, you can describe the etcd pod. To get the actual name of the etcd pod, you can usek get pod -n kube-system
to view all the pods in the kube-system namespace. Then, describe the etcd pod and locate the--advertise-client-urls
under theetcd
property.. In the output, search for the --advertise-client-urls argument within the etcd container. The value of this argument represents the address of the etcd server. See example below: `` kubectl describe pod etcd-controlplane -n kube-system | grep advertise-client-urls Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://<IP Address>:<Port> --advertise-client-urls=<IP Address>:<Port> -
--key=<key-file>
: This is the path to the key file for SSL/TLS connections. -
--cert=<cert-file>
: This is the path to the certificate file for SSL/TLS connections. -
--cacert=<trusted-ca-file>
: This is the path to the CA certificate file for SSL/TLS connections.
Similarly, you can locate the certificates in the output from describing the etcd-controlplane pod. You can find them by searching for the --key, --cert, and --cacert arguments, respectively. Like the following:
kubectl describe pod etcd-controlplane -n kube-system | grep ca
Priority Class Name: system-node-critical
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
snapshot save <backup-file-location>
: This creates a snapshot and saves it to the specified location.
Let’s create a backup using the actual values as shown below:
ETCDCTL_API=3 etcdctl --endpoints=https://<IP Address>:<Port> --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/peer.key --cert=/etc/kubernetes/pki/etcd/peer.crt snapshot save etcd-backup
{"level":"info","ts":1720984993.2370112,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"etcd-backup.part"}
Snapshot saved at etcd-backup
Note, you would have to replace the <IP Address> and <Port> number with the actual values for the operation to execute.
View/Delete the deployment deploy1
$ k get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
deploy1 1/1 1 1 3s
$ k delete deploy deploy1
deployment.apps "deploy1" deleted
$k get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
No resources found in default namespace.
Perfect, the deployment has been deleted
Restore etcd from backup file
Caution: There are some caution you must consider before restoring etcd like “If any API servers are running in your cluster, you should not attempt to restore instances of etcd” Please see the kubernetes official docs on etcd backup and restore for details.
Before running the etch restore command, it is advisable to verify that an etcd snapshot is available in the directory from where you're running the command, or if it's located elsewhere, you need to provide the full path to the file before executing the restore command. That can be check by listing the content of the current directory, like this:
ls
etcd-backup
Also, the etcdctl snapshot status etcd-backup
can be use to verify the status of the snapshot.
To restore etcd from backup file, use the following command syntax: :
etcdctl --data-dir <data-dir-location> snapshot restore <backup file>
Here's an explanation of the command:
etcdctl
: This is the command line client for etcd.--data-dir /var/lib/etcd-restore
: This specifies the directory where the restored data will be stored.snapshot restore etcd-backup
: This restores the snapshot from the etcd-backup file.
Let's run the etcdclt command with the actual vaules as shown below:
etcdctl --data-dir /var/lib/etcd-restore snapshot restore etcd-backup
A similar output as shown below would be displayed:
2024-07-14T19:36:45Z info snapshot/v3_snapshot.go:251 restoring snapshot {"path": "etcd-backup", "wal-dir": "/var/lib/etcd-restore/member/wal", "data-dir":
----
----
{"path": "etcd-backup", "wal-dir": "/var/lib/etcd-restore/member/wal", "data-dir": "/var/lib/etcd-restore", "snap-dir": "/var/lib/etcd-restore/member/snap"}
Lastly, update the spec.volumes.hostPath
with the name etcd-data
in "/etc/kubernetes/manifests/etcd.yaml" file from the value "/var/lib/etcd" to "/var/lib/etcd-restore"
See the section of the file to update below:
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd-restore # updated etcd to the same file name used for the etcd restore action
type: DirectoryOrCreate
name: etcd-data
Wait for a few minutes for the cluster to recognise the changes made in the etcd manifest file. After about 3 minutes, view the deployment to see if the previously deleted deployment is restored back.
k get po
NAME READY STATUS RESTARTS AGE
deploy1-5d54d7b7d4-5sjn4 1/1 Running 0 7m5s
Awesome! The deploy got restored back because the etcd was backed up prior to deleting it. The etcd restore operation retrieves all the cluster state prior to backup action. That concludes the backup and restore of etcd storage in a Kubernetes cluster which is a mandatory test in the Certified Kubernetes Administrator (CKA) exam.
I hope this helps! Go to the contact page and let me know if you have any further questions.
Happy learning!
Additional resources:
Kubernetes official docs on Operating etcd clusters for Kubernetes