Agent-less backup strategies for oVirt/RHV

For oVirt and RHV, like for almost any other virtualization platform, it is possible to provide agent-less backup solution. Actually, there is even more than one way to do this. In this short article I’ll explain 3 different ways to backup VMs in these environments using snapshot mechanism.

Let me tell you a short story…

In the past it was common to backup virtual environments with file agents installed inside each VM. The main issue was administrative overhead. And over time, when more and more VMs were created, this overhead grew as well. And then somebody noticed that maybe instead of installing agent in each VM, we could use VM snapshots and export VM as a whole directly from hypervisor. Now, instead of managing fleet of agents, you can initiate backups from your hypervisor level. And snapshots don’t need anything inside to be installed to make it work.

Cool, so now we have single piece of software that talks with hypervisor and we can setup periodic backups on hundreds of VMs with just a few checkboxes. Configured – so mission accomplished! Well, not really… After a while, you noticed that periodically you are transferring terabytes of data, even if just a few files have been changed. And this is where CBT (Changed-Block Tracking) mechanisms came into. Actually, not only CBT, but incremental APIs in general, so that backup solutions could either has some information which blocks have changed or just download data changed since specific snapshot – which results in incremental backups.

After a while you wanted to restore some VM and noticed that some of them didn’t work as expected after restore. There were some consistency issues – similar to what you see if the VM crashed.

Let’s be honest – snapshot backup by default is crash consistent. So, the VM after restore “feels” like there was power outage, as the backup solutions typically don’t capture RAM during backup. This means that applications such as databases, may in rare cases not recover successfully, as they assume that what has been written to the disk is actually there. And then quiesced snapshots, VSS and filesystem freeze mechanism came into the hypervisor world. Now these let operating system either to flush buffers or freeze writes just before snapshot is taken. And yes – it requires agent on the OS, so you have to install some software on every VM again, but a good practice is to install such tools anyway.

So, assuming that we want to use native oVirt/RHV mechanisms, what are the options?

Backup strategy 1 – export storage domain

Let’s say that snapshot-based backup was available in oVirt/RHV 3.5.1. Technically you could do that earlier but it was 3.5.1 with CentOS/RHEL 7 that allowed you to remove snapshot while VM was running. Once the snapshot was created you had to clone the VM and later export it.

Why do we need additional clone? The reason was that oVirt/RHV doesn’t allow you to export snapshot directly. Advantage was that the backup process itself was done by the hypervisor, so even with some average scripting skills, you could have snapshot-based backup.

There were however several drawbacks. First – additional cloning, which requires additional storage space and time for backup. Secondly – active export storage domain could only be one in each datacenter, which sometimes was not flexible. Finally, you cloned and exported whole VM, even if you didn’t want specific drives to be exported.

Backup strategy 2 – disk attachment with Proxy VM

In this strategy you have a VM – let’s call it “Proxy VM” that asks your manager to snapshot and attach drives of a specific VM. Now your proxy VM is able to see and dump all of the data from the VM that you want to backup.

What is cool is that you don’t need export storage domain, and you can easily exclude drives which you don’t need. From the setup point of view – you need to install 1 Proxy VM per cluster, so that your backup solution it is able to attach drives. It is also nice starting point for future CBT support (no – in oVirt it not there yet).

Drawbacks… Well, for starters – unlike export storage domain, which uses simple commands/API calls such as snapshot + clone + export – you probably won’t script this at home. This time the backup/restore process actually requires you to think of many aspects such as metadata handling and the disk attachment process itself and overall API handling (which in oVirt/RHV case – may be tricky). This is why you would need a proper backup solution, where somebody has done this part for you.

Backup strategy 3 – disk image transfer API

Now this one is new – it appeared in oVirt/RHV 4.2. Basic idea is to have easy way to export individual snapshots from the RHV manager. So now, instead of having to install multiple Proxy VMs, you can have single external backup solution installation, which just invokes APIs via RHV manager.

Advantages – easier setup – assuming you have oVirt/RHV 4.2 or newer – it is sort of plug-to-your-manager-and-play. From network point of view it just requires to additional ports to opened 54322 and 54323 and your data be pulled from hypervisor, and finally you have option to export just changed data.

Unfortunately, there are few problems with current architecture of this solution. The biggest issue is that all traffic passes via oVirt/RHV manager, wich impacts latency and transfer rates that you can achieve during the backup process. To put that into perspective – in disk attachment you can basically read data as if it is local drive, where it could potentially be deduplicated even before transferring it to the backup destination. This also impacts scalability, as the bottleneck, sooner or later, will you your manager.

Wrap up

So, what to choose…? Option 1 is going to disappear in the future from oVirt/RHV at all. Option 3 – while you are able to do incremental backups, it eventually may not scale well and the overall transfer may be poor, at the same time impacting your manager. Currently, I believe that disk attachment, even without incremental backups, makes much more sense. If you don’t want to transfer significant amount of data to your backup destination every day, you can split your VMs into multiple groups and backup them for instance 2-3 times per week instead of daily. And sooner or later CBT is going to appear in oVirt/RHV anyway, so incremental backups will be possible even when using disk-attachment method.



Marcin Kubacki

Chief Software Architect, Member of the Board at Storware, Ph.D. Marcin, sometimes called Mr. V., as a an inventor of Storware vProtect code, joined the company in 2015. In 2016 Marcin earned a Ph.D. at Warsaw University of Technology.