How to create a backup cron job for MongoDB database on kubernetes (GKS)

Erwan Riou
4 min readApr 3, 2021
kubernetes backup

In this article i am going to show you how to create a cron job step by step to backups multiples mongoDB databases into GCS (Google Cloud Storage).

Let’s suppose that you currently run a cluster with several database deployments. This tutorial apply for mongoDb databases, but if you understand the purpose of how we do it, you could very well apply it for other type of databases.

Also we are going to use GCS, but this could also very well work with S3 instead with few changes.

ClusterRole and ClusterRoleBinding

The first thing we will need to define in our manifest is a ClusterRole that will later on used to to exec and cp operations over the running pods that are hosting our databases. Start create a manifest file and let’s write the ClusterRole:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cluster-admin
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods/exec", "pods/cp"]
verbs: ["create", "update", "patch"]

A ClusterRole is associated with a ServiceAccount through a ClusterRoleBinding, so in order to use this new role, we need to add 2 new more kind to our manifest:

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cluster-admin
namespace: default
subjects:
- kind: ServiceAccount
name: cluster-reader
namespace: default
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io

and

---
kind: ServiceAccount
apiVersion: v1
metadata:
name: cluster-reader
namespace: default

Now you have all the manifests needed in order to do kubectl commands inside a container. let’s see now how to backup a database.

Configure the backup through a ConfigMap

ConfigMap is the ideal solution here as it let you store temporary a small piece of code that will be used to backup the database. We will write this backup in shell.

Let’s start with the backup itself. Ideally you want to map over all your existing databases, access them and do a mongodump or each one of them. Like this:

mkdir backups
BACKUP_DIR=$(date +'%m.%d.%Y')
array=($(kubectl get pods | grep mongo | awk '{ print $1 }'))
for KEY in "${!array[@]}"; do
kubectl exec -i ${array[$KEY]} -- bash -c "cd tmp && mongodump --archive > mongo.dump && exit"
kubectl cp ${array[$KEY]}:/tmp/mongo.dump /backups/${array[$KEY]}.dump
done

here we create a backups folder inside our temporary container (that will be setup during the cron job) and as you can see we access the database with the command kubectl exec -i (we don’t need -it because we don’t need human interface). Once inside the container with bash we just do a mongodump — archive > mongo.dump to create a compressed archive of our database.

As for pushing this backup into GCS, there we will use the gutil command. The only issue is that we need to be authenticated and as we would be inside a temporary container we currently are not.
First if you don’t have one already, you need to create a service account key with create access role over google storage. This key is a json file that contain several informations. once you have it we can start create the ConfigMap:

---
kind: ConfigMap
apiVersion: v1
metadata:
name: backup-script
data:
backup.sh: |
#!/bin/bash
JSON_FILE=$(cat <<-END
{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "${GCS_PRIVATE_KEY_ID}",
"private_key": "${GCS_PRIVATE_KEY}",
"client_email": "storage-update@your-project.iam.gserviceaccount.com",
"client_id": "${GCS_CLIENT_ID}",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/storage-update%40your-project.iam.gserviceaccount.com"
}
END
)
echo "$JSON_FILE" > key.json
gcloud auth activate-service-account --key-file=key.json
mkdir backups
BACKUP_DIR=$(date +'%m.%d.%Y')
array=($(kubectl get pods | grep mongo | awk '{ print $1 }'))
for KEY in "${!array[@]}"; do
kubectl exec -i ${array[$KEY]} -- bash -c "cd tmp && mongodump --archive > mongo.dump && exit"
kubectl cp ${array[$KEY]}:/tmp/mongo.dump /backups/${array[$KEY]}.dump
done
gsutil cp -r /backups gs://your-project-backups/${BACKUP_DIR}

As you can see i am creating a json file with all the fields to match this exact service account key. (some of them are secrets that i am passing through the container as GCS_PRIVATE_KEY_ID GCS_PRIVATE_KEY GCS_CLIENT_ID
The file that contain this ConfigMap is called backup.sh and this is the one we are going to use into our cron job to daily backup our databases.

Create a backup Cron Job manifest

This is the easiest part. We just need to create CronJob manifest making sure to specify:

  • our serviceAccountName that will give the authorisation to exec over the containers
  • the container with volume that is our ConfigMap
  • the credentials to authentify GCS as env

There is only one small issue about this container that is we need to define a securityContext in order to be able to create a backup folder and write in it as its specified in the ConfigMap.

Also we are going to use an image that already have kubernetes and gutil installed in it to avoid fastidious installations.
gcr.io/google.com/cloudsdktool/cloud-sdk:latest

The CronJob would look like this:

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cron-backup
spec:
schedule: "0 0 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: cluster-reader
restartPolicy: OnFailure
volumes:
- name: backup-script
configMap:
name: backup-script
defaultMode: 0777
containers:
- name: runner
image: gcr.io/google.com/cloudsdktool/cloud-sdk:latest
command: ["/bin/bash", "-c", "/scripts/backup.sh"]
securityContext:
runAsUser: 0
volumeMounts:
- name: backup-script
mountPath: /scripts/backup.sh
subPath: backup.sh
env:
- name: GOOGLE_STORAGE_BUCKET
value: myproject-backup
- name: GCS_PRIVATE_KEY_ID
valueFrom:
secretKeyRef:
name: gcs-private-key-id
key: GCS_PRIVATE_KEY_ID
- name: GCS_CLIENT_ID
valueFrom:
secretKeyRef:
name: gcs-client-id
key: GCS_CLIENT_ID
- name: GCS_PRIVATE_KEY
valueFrom:
secretKeyRef:
name: gcs-private-key
key: GCS_PRIVATE_KEY

Et voila! Your can found all the code here if needed:
https://gist.github.com/erwanriou/cf2d5c67d85b5b8d9a162f5129ff08b8

--

--