Velero supports backing up and restoring Kubernetes volumes attached to pods from the file system of the volumes, called File System Backup (FSB shortly) or Pod Volume Backup. The data movement is fulfilled by using modules from free open-source backup tools restic and kopia. This support is considered beta quality. Please see the list of limitations to understand if it fits your use case.
Velero allows you to take snapshots of persistent volumes as part of your backups if you’re using one of the supported cloud providers’ block storage offerings (Amazon EBS Volumes, Azure Managed Disks, Google Persistent Disks). It also provides a plugin model that enables anyone to implement additional object and block storage backends, outside the main Velero repository.
If your storage supports CSI (Container Storage Interface) snapshots, Velero also allows you to take snapshots through CSI and then optionally move the snapshot data to a different storage location.
Velero’s File System Backup is an addition to the aforementioned snapshot approaches. Its pros and cons are listed below:
Pros:
Cons:
NOTE: hostPath volumes are not supported, but the
local volume type is supported.
NOTE: restic is under the deprecation process by following
Velero Deprecation Policy, for more details, see the Restic Deprecation section.
Velero Node Agent is a Kubernetes daemonset that hosts FSB modules, i.e., restic, kopia uploader & repository.
To install Node Agent, use the --use-node-agent
flag in the velero install
command. See the
install overview for more
details on other flags for the install command.
velero install --use-node-agent
When using FSB on a storage that doesn’t have Velero support for snapshots, the --use-volume-snapshots=false
flag prevents an
unused VolumeSnapshotLocation
from being created on installation.
At present, Velero FSB supports object storage as the backup storage only. Velero gets the parameters from the
BackupStorageLocation config
to compose the URL to the backup storage. Velero’s known object
storage providers are include here
supported providers, for which, Velero pre-defines the endpoints; if you
want to use a different backup storage, make sure it is S3 compatible and you provide the correct bucket name and endpoint in
BackupStorageLocation. Alternatively, for Restic, you could set the resticRepoPrefix
value in BackupStorageLocation. For example,
on AWS, resticRepoPrefix
is something like s3:s3-us-west-2.amazonaws.com/bucket
(note that resticRepoPrefix
doesn’t work for Kopia).
Velero handles the creation of the backup repo prefix in the backup storage, so make sure it is specified in BackupStorageLocation correctly.
Velero creates one backup repo per namespace. For example, if backing up 2 namespaces, namespace1 and namespace2, using kopia
repository on AWS S3, the full backup repo path for namespace1 would be https://s3-us-west-2.amazonaws.com/bucket/kopia/ns1
and
for namespace2 would be https://s3-us-west-2.amazonaws.com/bucket/kopia/ns2
.
There may be additional installation steps depending on the cloud provider plugin you are using. You should refer to the plugin specific documentation for the most up to date information.
Note: Currently, Velero creates a secret named velero-repo-credentials
in the velero install namespace, containing a default backup repository password.
You can update the secret with your own password encoded as base64 prior to the first backup (i.e., FS Backup, data mover) targeting to the backup repository. The value of the key to update is
data:
repository-password: <custom-password>
Backup repository is created during the first execution of backup targeting to it after installing Velero with node agent. If you update the secret password after the first backup which created the backup repository, then Velero will not be able to connect with the older backups.
After installation, some PaaS/CaaS platforms based on Kubernetes also require modifications the node-agent DaemonSet spec. The steps in this section are only needed if you are installing on RancherOS, Nutanix, OpenShift, VMware Tanzu Kubernetes Grid Integrated Edition (formerly VMware Enterprise PKS), or Microsoft Azure.
RancherOS
Update the host path for volumes in the node-agent DaemonSet in the Velero namespace from /var/lib/kubelet/pods
to
/opt/rke/var/lib/kubelet/pods
.
hostPath:
path: /var/lib/kubelet/pods
to
hostPath:
path: /opt/rke/var/lib/kubelet/pods
Nutanix
Update the host path for volumes in the node-agent DaemonSet in the Velero namespace from /var/lib/kubelet/pods
to
/var/nutanix/var/lib/kubelet
.
hostPath:
path: /var/lib/kubelet/pods
to
hostPath:
path: /var/nutanix/var/lib/kubelet
OpenShift
To mount the correct hostpath to pods volumes, run the node-agent pod in privileged
mode.
Add the velero
ServiceAccount to the privileged
SCC:
oc adm policy add-scc-to-user privileged -z velero -n velero
Install Velero with the ‘–privileged-node-agent’ option to request a privileged mode:
velero install --use-node-agent --privileged-node-agent
If node-agent is not running in a privileged mode, it will not be able to access pods volumes within the mounted
hostpath directory because of the default enforced SELinux mode configured in the host system level. You can
create a custom SCC to relax the
security in your cluster so that node-agent pods are allowed to use the hostPath volume plugin without granting
them access to the privileged
SCC.
By default a userland openshift namespace will not schedule pods on all nodes in the cluster.
To schedule on all nodes the namespace needs an annotation:
oc annotate namespace <velero namespace> openshift.io/node-selector=""
This should be done before velero installation.
Or the ds needs to be deleted and recreated:
oc get ds node-agent -o yaml -n <velero namespace> > ds.yaml
oc annotate namespace <velero namespace> openshift.io/node-selector=""
oc create -n <velero namespace> -f ds.yaml
VMware Tanzu Kubernetes Grid Integrated Edition (formerly VMware Enterprise PKS)
You need to enable the Allow Privileged
option in your plan configuration so that Velero is able to mount the hostpath.
The hostPath should be changed from /var/lib/kubelet/pods
to /var/vcap/data/kubelet/pods
hostPath:
path: /var/vcap/data/kubelet/pods
Velero supports two approaches of discovering pod volumes that need to be backed up using FSB:
The following sections provide more details on the two approaches.
In this approach, Velero will back up all pod volumes using FSB with the exception of:
It is possible to exclude volumes from being backed up using the backup.velero.io/backup-volumes-excludes
annotation on the pod.
Instructions to back up using this approach are as follows:
Run the following command on each pod that contains volumes that should not be backed up using FSB
kubectl -n YOUR_POD_NAMESPACE annotate pod/YOUR_POD_NAME backup.velero.io/backup-volumes-excludes=YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2,...
where the volume names are the names of the volumes in the pod spec.
For example, in the following pod:
apiVersion: v1
kind: Pod
metadata:
name: app1
namespace: sample
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-webserver
volumeMounts:
- name: pvc1-vm
mountPath: /volume-1
- name: pvc2-vm
mountPath: /volume-2
volumes:
- name: pvc1-vm
persistentVolumeClaim:
claimName: pvc1
- name: pvc2-vm
claimName: pvc2
to exclude FSB of volume pvc1-vm
, you would run:
kubectl -n sample annotate pod/app1 backup.velero.io/backup-volumes-excludes=pvc1-vm
Take a Velero backup:
velero backup create BACKUP_NAME --default-volumes-to-fs-backup OTHER_OPTIONS
The above steps uses the opt-out approach on a per backup basis.
Alternatively, this behavior may be enabled on all velero backups running the velero install
command with
the --default-volumes-to-fs-backup
flag. Refer
install overview for details.
When the backup completes, view information about the backups:
velero backup describe YOUR_BACKUP_NAME
kubectl -n velero get podvolumebackups -l velero.io/backup-name=YOUR_BACKUP_NAME -o yaml
Velero, by default, uses this approach to discover pod volumes that need to be backed up using FSB. Every pod
containing a volume to be backed up using FSB must be annotated with the volume’s name using the
backup.velero.io/backup-volumes
annotation.
Instructions to back up using this approach are as follows:
Run the following for each pod that contains a volume to back up:
kubectl -n YOUR_POD_NAMESPACE annotate pod/YOUR_POD_NAME backup.velero.io/backup-volumes=YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2,...
where the volume names are the names of the volumes in the pod spec.
For example, for the following pod:
apiVersion: v1
kind: Pod
metadata:
name: sample
namespace: foo
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-webserver
volumeMounts:
- name: pvc-volume
mountPath: /volume-1
- name: emptydir-volume
mountPath: /volume-2
volumes:
- name: pvc-volume
persistentVolumeClaim:
claimName: test-volume-claim
- name: emptydir-volume
emptyDir: {}
You’d run:
kubectl -n foo annotate pod/sample backup.velero.io/backup-volumes=pvc-volume,emptydir-volume
This annotation can also be provided in a pod template spec if you use a controller to manage your pods.
Take a Velero backup:
velero backup create NAME OPTIONS...
When the backup completes, view information about the backups:
velero backup describe YOUR_BACKUP_NAME
kubectl -n velero get podvolumebackups -l velero.io/backup-name=YOUR_BACKUP_NAME -o yaml
Regardless of how volumes are discovered for backup using FSB, the process of restoring remains the same.
Restore from your Velero backup:
velero restore create --from-backup BACKUP_NAME OPTIONS...
When the restore completes, view information about your pod volume restores:
velero restore describe YOUR_RESTORE_NAME
kubectl -n velero get podvolumerestores -l velero.io/restore-name=YOUR_RESTORE_NAME -o yaml
hostPath
volumes are not supported.
Local persistent volumes are supported.emptyDir
volumes, when a pod is deleted/recreated (for example, by a ReplicaSet/Deployment),
the next backup of those volumes will be full rather than incremental, because the pod volume’s lifecycle is assumed
to be defined by its pod.<hostPath>/<pod UID>
(hostPath
is configurable as mentioned in
Configure Node Agent DaemonSet spec). Some Kubernetes systems (i.e.,
vCluster) don’t mount volumes under the <pod UID>
sub-dir, Velero File System Backup is not working with them.Velero uses a helper init container when performing a FSB restore. By default, the image for this container is
velero/velero-restore-helper:<VERSION>
, where VERSION
matches the version/tag of the main Velero image.
You can customize the image that is used for this helper by creating a ConfigMap in the Velero namespace with the alternate image.
In addition, you can customize the resource requirements for the init container, should you need.
The ConfigMap must look like the following:
apiVersion: v1
kind: ConfigMap
metadata:
# any name can be used; Velero uses the labels (below)
# to identify it rather than the name
name: fs-restore-action-config
# must be in the velero namespace
namespace: velero
# the below labels should be used verbatim in your
# ConfigMap.
labels:
# this value-less label identifies the ConfigMap as
# config for a plugin (i.e. the built-in restore
# item action plugin)
velero.io/plugin-config: ""
# this label identifies the name and kind of plugin
# that this ConfigMap is for.
velero.io/pod-volume-restore: RestoreItemAction
data:
# The value for "image" can either include a tag or not;
# if the tag is *not* included, the tag from the main Velero
# image will automatically be used.
image: myregistry.io/my-custom-helper-image[:OPTIONAL_TAG]
# "cpuRequest" sets the request.cpu value on the restore init containers during restore.
# If not set, it will default to "100m". A value of "0" is treated as unbounded.
cpuRequest: 200m
# "memRequest" sets the request.memory value on the restore init containers during restore.
# If not set, it will default to "128Mi". A value of "0" is treated as unbounded.
memRequest: 128Mi
# "cpuLimit" sets the request.cpu value on the restore init containers during restore.
# If not set, it will default to "100m". A value of "0" is treated as unbounded.
cpuLimit: 200m
# "memLimit" sets the request.memory value on the restore init containers during restore.
# If not set, it will default to "128Mi". A value of "0" is treated as unbounded.
memLimit: 128Mi
# "secCtxRunAsUser" sets the securityContext.runAsUser value on the restore init containers during restore.
secCtxRunAsUser: 1001
# "secCtxRunAsGroup" sets the securityContext.runAsGroup value on the restore init containers during restore.
secCtxRunAsGroup: 999
# "secCtxAllowPrivilegeEscalation" sets the securityContext.allowPrivilegeEscalation value on the restore init containers during restore.
secCtxAllowPrivilegeEscalation: false
# "secCtx" sets the securityContext object value on the restore init containers during restore.
# This key override `secCtxRunAsUser`, `secCtxRunAsGroup`, `secCtxAllowPrivilegeEscalation` if `secCtx.runAsUser`, `secCtx.runAsGroup` or `secCtx.allowPrivilegeEscalation` are set.
secCtx: |
capabilities:
drop:
- ALL
add: []
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsUser: 1001
runAsGroup: 999
Run the following checks:
Are your Velero server and daemonset pods running?
kubectl get pods -n velero
Does your backup repository exist, and is it ready?
velero repo get
velero repo get REPO_NAME -o yaml
Are there any errors in your Velero backup/restore?
velero backup describe BACKUP_NAME
velero backup logs BACKUP_NAME
velero restore describe RESTORE_NAME
velero restore logs RESTORE_NAME
What is the status of your pod volume backups/restores?
kubectl -n velero get podvolumebackups -l velero.io/backup-name=BACKUP_NAME -o yaml
kubectl -n velero get podvolumerestores -l velero.io/restore-name=RESTORE_NAME -o yaml
Is there any useful information in the Velero server or daemon pod logs?
kubectl -n velero logs deploy/velero
kubectl -n velero logs DAEMON_POD_NAME
NOTE: You can increase the verbosity of the pod logs by adding --log-level=debug
as an argument
to the container command in the deployment/daemonset pod template spec.
Velero integrate Restic binary directly, so the operations are done by calling Restic commands:
restic init
command to initialize the
restic repositoryrestic prune
command periodically to prune restic repositoryrestic backup
commands to backup pod volume datarestic restore
commands to restore pod volume dataVelero integrate Kopia modules into Velero’s code, primarily two modules:
For more details, refer to kopia architecture and Velero’s Unified Repository & Kopia Integration Design
Velero has three custom resource definitions and associated controllers:
BackupRepository
- represents/manages the lifecycle of Velero’s backup repositories. Velero creates
a backup repository per namespace when the first FSB backup/restore for a namespace is requested. The backup
repository is backed by restic or kopia, the BackupRepository
controller invokes restic or kopia internally,
refer to
restic integration and
kopia integration
for details.
You can see information about your Velero’s backup repositories by running velero repo get
.
PodVolumeBackup
- represents a FSB backup of a volume in a pod. The main Velero backup process creates
one or more of these when it finds an annotated pod. Each node in the cluster runs a controller for this
resource (in a daemonset) that handles the PodVolumeBackups
for pods on that node. PodVolumeBackup
is backed by
restic or kopia, the controller invokes restic or kopia internally, refer to
restic integration
and
kopia integration for details.
PodVolumeRestore
- represents a FSB restore of a pod volume. The main Velero restore process creates one
or more of these when it encounters a pod that has associated FSB backups. Each node in the cluster runs a
controller for this resource (in the same daemonset as above) that handles the PodVolumeRestores
for pods
on that node. PodVolumeRestore
is backed by restic or kopia, the controller invokes restic or kopia internally,
refer to
restic integration and
kopia integration for details.
Velero’s FSB supports two data movement paths, the restic path and the kopia path. Velero allows users to select between the two paths:
uploader-type
flag, the valid value is
either restic
or kopia
, or default to kopia
if the value is not specified. The selection is not allowed to be
changed after the installation.uploader-type=kopia
, when you create
a restore from the backup, the restore still goes with restic path.BackupRepository
custom resource already existsBackupRepository
controller to init/connect itPodVolumeBackup
custom resource per volume listed in the pod annotationPodVolumeBackup
resources to complete or failPodVolumeBackup
is handled by the controller on the appropriate node, which:
/var/lib/kubelet/pods
to access the pod volume dataCompleted
or Failed
PodVolumeBackup
finishes, the main Velero process adds it to the Velero backup in a file named
<backup-name>-podvolumebackups.json.gz
. This file gets uploaded to object storage alongside the backup tarball.
It will be used for restores, as seen in the next section.PodVolumeBackup
custom resource in the cluster to backup from.PodVolumeBackup
found, Velero first ensures a backup repository exists for the pod’s namespace, by:
BackupRepository
custom resource already existsBackupRepository
controller to connect it (note that
in this case, the actual repository should already exist in backup storage, so the Velero controller will simply
check it for integrity and make a location connection)PodVolumeRestore
custom resource for each volume to be restored in the podPodVolumeRestore
resource to complete or failPodVolumeRestore
is handled by the controller on the appropriate node, which:
/var/lib/kubelet/pods
to access the pod volume data.velero
subdirectory, whose name is the UID of the Velero
restore that this pod volume restore is forCompleted
or Failed
.velero
, whose name is the UID of the Velero restore being runVelero won’t restore a resource if a that resource is scaled to 0 and already exists in the cluster. If Velero restored the requested pods in this scenario, the Kubernetes reconciliation loops that manage resources would delete the running pods because its scaled to be 0. Velero will be able to restore once the resources is scaled up, and the pods are created and remain running.
When a backup is created, a snapshot is saved into the repository for the volume data under the both path. The snapshot is a reference to the volume data saved in the repository.
When deleting a backup, Velero calls the repository to delete the repository snapshot. So the repository snapshot disappears immediately after the backup is deleted. Then the volume data backed up in the repository turns to orphan, but it is not deleted by this time. The repository relies on the maintenance functionalitiy to delete the orphan data.
As a result, after you delete a backup, you don’t see the backup storage size reduces until some full maintenance jobs completes successfully. And for the same reason, you should check and make sure that the periodical repository maintenance job runs and completes successfully.
Even after deleting all the backups and their backup data (by repository maintenance), the backup storage is still not empty, some repository metadata are there to keep the instance of the backup repository.
Furthermore, Velero never deletes these repository metadata, if you are sure you’ll never usage the backup repository, you can empty the backup storage manually.
For Kopia path, Kopia uploader may keep some internal snapshots which is not managed by Velero. In normal cases, the internal snapshots are deleted along with running of backups.
However, if you run a backup which aborts halfway(some internal snapshots are thereby generated) and never run new backups again, some internal snapshots may be left there. In this case, since you stop using the backup repository, you can delete the entire repository metadata from the backup storage manually.
Velero does not provide a mechanism to detect persistent volume claims that are missing the File System Backup annotation.
To solve this, a controller was written by Thomann Bits&Beats: velero-pvc-watcher
When the Velero server/node-agent pod’s SecurityContext sets the ReadOnlyRootFileSystem
parameter to true, the Velero server/node-agent pod’s filesystem is running in read-only mode.
If the user creates a backup with Kopia as the uploader, the backup will fail, because the Kopia needs to write some cache and configuration data into the pod filesystem.
Errors: Velero: name: /mongodb-0 message: /Error backing up item error: /failed to wait BackupRepository: backup repository is not ready: error to connect to backup repo: error to connect repo with storage: error to connect to repository: unable to write config file: unable to create config directory: mkdir /home/cnb/udmrepo: read-only file system name: /mongodb-1 message: /Error backing up item error: /failed to wait BackupRepository: backup repository is not ready: error to connect to backup repo: error to connect repo with storage: error to connect to repository: unable to write config file: unable to create config directory: mkdir /home/cnb/udmrepo: read-only file system name: /mongodb-2 message: /Error backing up item error: /failed to wait BackupRepository: backup repository is not ready: error to connect to backup repo: error to connect repo with storage: error to connect to repository: unable to write config file: unable to create config directory: mkdir /home/cnb/udmrepo: read-only file system Cluster: <none>
The workaround is making those directories as ephemeral k8s volumes, then those directories are not counted as pod’s root filesystem.
The user-name
is the Velero pod’s running user name. The default value is cnb
.
apiVersion: apps/v1
kind: Deployment
metadata:
name: velero
namespace: velero
spec:
template:
spec:
containers:
- name: velero
......
volumeMounts:
......
- mountPath: /home/<user-name>/udmrepo
name: udmrepo
- mountPath: /home/<user-name>/.cache
name: cache
......
volumes:
......
- emptyDir: {}
name: udmrepo
- emptyDir: {}
name: cache
......
Both the uploader and repository consume remarkable CPU/memory during the backup/restore, especially for massive small files or large backup size cases.
Velero node-agent uses
BestEffort as the QoS for node-agent pods (so no CPU/memory request/limit is set), so that backups/restores wouldn’t fail due to resource throttling in any cases.
If you want to constraint the CPU/memory usage, you need to
customize the resource limits. The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to
Performance Guidance for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data.
For Kopia path, some memory is preserved by the node-agent to avoid frequent memory allocations, therefore, after you run a file-system backup/restore, you won’t see node-agent releases all the memory until it restarts. There is a limit for the memory preservation, so the memory won’t increase all the time. The limit varies from the number of CPU cores in the cluster nodes, as calculated below:
preservedMemoryInOneNode = 128M + 24M * numOfCPUCores
The memory perservation only happens in the nodes where backups/restores ever occur. Assuming file-system backups/restores occur in ever worker node and you have equal CPU cores in each node, the maximum possibly preserved memory in your cluster is:
totalPreservedMemory = (128M + 24M * numOfCPUCores) * numOfWorkerNodes
However, whether and when this limit is reached is related to the data you are backing up/restoring.
During the restore, the repository may also cache data/metadata so as to reduce the network footprint and speed up the restore. The repository uses its own policy to store and clean up the cache.
For Kopia repository, the cache is stored in the node-agent pod’s root file system. Velero allows you to configure a limit of the cache size so that the node-agent pod won’t be evicted due to running out of the ephemeral storage. For more details, check
Backup Repository Configuration.
According to the Velero Deprecation Policy, restic path is being deprecated starting from v1.15, specifically:
For 1.15 and 1.16, you will see below warnings if --uploader-type=restic
is used in Velero installation:
In the output of installation:
⚠️ Uploader 'restic' is deprecated, don't use it for new backups, otherwise the backups won't be available for restore when this functionality is removed in a future version of Velero
In Velero server log:
level=warning msg="Uploader 'restic' is deprecated, don't use it for new backups, otherwise the backups won't be available for restore when this functionality is removed in a future version of Velero
In the output of velero backup describe
command for a backup with fs-backup:
Namespaces:
<namespace>: resource: /pods name: <pod name> message: /Uploader 'restic' is deprecated, don't use it for new backups, otherwise the backups won't be available for restore when this functionality is removed in a future version of Velero
And you will see below warnings you upgrade from v1.9 or lower to 1.15 or 1.16: In Velero server log:
level=warning msg="Uploader 'restic' is deprecated, don't use it for new backups, otherwise the backups won't be available for restore when this functionality is removed in a future version of Velero
In the output of velero backup describe
command for a backup with fs-backup:
Namespaces:
<namespace>: resource: /pods name: <pod name> message: /Uploader 'restic' is deprecated, don't use it for new backups, otherwise the backups won't be available for restore when this functionality is removed in a future version of Velero
To help you get started, see the documentation.