Cyril Ajieh Personal Web Presence

Introduction

In Kubernetes, etcd serves as the fundamental storage unit for preserving the state of the cluster. It's a distributed, reliable, secure, and fast key-value store that backs up all cluster data. This includes the configuration details of resources such pods, deployments, services, and replicasets, as well as the current state of the system, such as the nodes that are part of the cluster and the running pods. Given its role in storing the entire state of the cluster, etcd is a crucial component of a Kubernetes setup. If etcd data is lost, the Kubernetes cluster may lose data and might require recovery or recreation. With etcd backups, the cluster can be restored to a known state, which can minimize downtime and ensure business continuity.

Pre-requisite:

Basic knowledge of Container technology and kubernetes are required
Proficient in the use of command line tools i.e. Bash terminal
Access to a Kubernetes cluster if you want to experiment with the example in this tutorial

Getting started

In this tutorial, I will guide you through the process of backing up and restoring Kubernetes etcd storage, demonstrating the process in a very simple way . Let’s get started by creating a deployment using any image of your choice. I will be using nginx image.

$ k create deployment deploy1 --image=nginx
deployment.apps/deploy1 created

Let’s view the deployment

$ k get deploy
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
deploy1   1/1     1            0           6s

Backup the current state of the cluster

Firstly, if you're not already on the control-plane node, SSH into it before performing the etcd backup. Then, execute k get nodes to ensure all nodes are in the Ready state. Then Identify the version of etcd running in the cluster, and locate the endpoint URL for the etcd service. Also, determine the locations of the private key, client or server, and CA certificates, as these are necessary for authentication; without them, the backup operation will fail. Finally, before creating an etcd backup, make sure the etcdctl-client utility tool is installed. If it's not, you can install it using the command apt update && apt install etcd-client.

The following command syntax can be use to create a snapshot of the etcd storage:

ETCDCTL_API=3 etcdctl --endpoints=https:/<IP Address>:<Port>  --cacert=<trusted-ca-file> --cert=<cert-file> --key=<key-file>  snapshot save <backup-file-location>

Here's an explanation of the command:

ETCDCTL_API=3: This sets the version of the etcdctl client to use. Version 3 is the latest and is required for the snapshot save command.
etcdctl: This is the command line client for etcd installed earlier if not already available in the cluster.
--endpoints=https:/<IP Address>:<Port>: This specifies the address of the etcd server. To get the address of the etcd server in a Kubernetes cluster, you can describe the etcd pod. To get the actual name of the etcd pod, you can use k get pod -n kube-system to view all the pods in the kube-system namespace. Then, describe the etcd pod and locate the --advertise-client-urls under the etcd property.. In the output, search for the --advertise-client-urls argument within the etcd container. The value of this argument represents the address of the etcd server. See example below: `` kubectl describe pod etcd-controlplane -n kube-system | grep advertise-client-urls Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://<IP Address>:<Port> --advertise-client-urls=<IP Address>:<Port>
--key=<key-file>: This is the path to the key file for SSL/TLS connections.
--cert=<cert-file>: This is the path to the certificate file for SSL/TLS connections.
--cacert=<trusted-ca-file>: This is the path to the CA certificate file for SSL/TLS connections.

Similarly, you can locate the certificates in the output from describing the etcd-controlplane pod. You can find them by searching for the --key, --cert, and --cacert arguments, respectively. Like the following:

kubectl describe pod etcd-controlplane -n kube-system | grep ca    
Priority Class Name:  system-node-critical
      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

snapshot save <backup-file-location>: This creates a snapshot and saves it to the specified location.

Let’s create a backup using the actual values as shown below:

ETCDCTL_API=3 etcdctl --endpoints=https://<IP Address>:<Port> --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/peer.key  --cert=/etc/kubernetes/pki/etcd/peer.crt snapshot save etcd-backup
{"level":"info","ts":1720984993.2370112,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"etcd-backup.part"}
Snapshot saved at etcd-backup

Note, you would have to replace the <IP Address> and <Port> number with the actual values for the operation to execute.

View/Delete the deployment deploy1

$ k get deploy
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
deploy1   1/1     1            1           3s

$ k delete deploy deploy1              
deployment.apps "deploy1" deleted
$k get deploy
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
No resources found in default namespace.

Perfect, the deployment has been deleted

Restore etcd from backup file

Caution: There are some caution you must consider before restoring etcd like “If any API servers are running in your cluster, you should not attempt to restore instances of etcd” Please see the kubernetes official docs on etcd backup and restore for details.

Before running the etch restore command, it is advisable to verify that an etcd snapshot is available in the directory from where you're running the command, or if it's located elsewhere, you need to provide the full path to the file before executing the restore command. That can be check by listing the content of the current directory, like this:

 ls
etcd-backup

Also, the etcdctl snapshot status etcd-backup can be use to verify the status of the snapshot.

To restore etcd from backup file, use the following command syntax: :

    etcdctl --data-dir <data-dir-location> snapshot restore <backup file>

Here's an explanation of the command:

etcdctl: This is the command line client for etcd.
--data-dir /var/lib/etcd-restore: This specifies the directory where the restored data will be stored.
snapshot restore etcd-backup: This restores the snapshot from the etcd-backup file.

Let's run the etcdclt command with the actual vaules as shown below:

    etcdctl --data-dir /var/lib/etcd-restore snapshot restore etcd-backup

A similar output as shown below would be displayed:

    2024-07-14T19:36:45Z    info    snapshot/v3_snapshot.go:251     restoring snapshot      {"path": "etcd-backup", "wal-dir": "/var/lib/etcd-restore/member/wal", "data-dir":
     ----
     ----
    {"path": "etcd-backup", "wal-dir": "/var/lib/etcd-restore/member/wal", "data-dir": "/var/lib/etcd-restore", "snap-dir": "/var/lib/etcd-restore/member/snap"}

Lastly, update the spec.volumes.hostPath with the name etcd-data in "/etc/kubernetes/manifests/etcd.yaml" file from the value "/var/lib/etcd" to "/var/lib/etcd-restore"

See the section of the file to update below:

  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd-restore # updated etcd to the same file name used for the etcd restore action 
      type: DirectoryOrCreate
    name: etcd-data

Wait for a few minutes for the cluster to recognise the changes made in the etcd manifest file. After about 3 minutes, view the deployment to see if the previously deleted deployment is restored back.

k get po
NAME                       READY   STATUS    RESTARTS   AGE
deploy1-5d54d7b7d4-5sjn4   1/1     Running   0          7m5s

Awesome! The deploy got restored back because the etcd was backed up prior to deleting it. The etcd restore operation retrieves all the cluster state prior to backup action. That concludes the backup and restore of etcd storage in a Kubernetes cluster which is a mandatory test in the Certified Kubernetes Administrator (CKA) exam.

I hope this helps! Go to the contact page and let me know if you have any further questions.

Happy learning!

Additional resources:

Kubernetes official docs on Operating etcd clusters for Kubernetes

How to Backup and Restore Kubernetes etcd Storage Simplified