This issue occurred in a production environment which utilizes iSCSI storage for VMs datastores and a small sized NAS for backup.
Daily backup of VMs (circa 1TB of total raw data) took several days to be completed causing overlapping of backup jobs: backups of the current day were running on some VMs while other's VMs backup processes of day before were still running.
Here is a simple schema of my environment: as you can see there are separate networks for iSCSI traffic and "general purpouse" network traffic consumed by VMs.
VMware Data Recovery performs backups by creating snapshots of VMs. These snapshots reside on iSCSI storage datastores and are mounted as hard drives on VMware Data Recovery appliance which performs data deduplication before storing data to backup destination (NAS storage). Once backup are completed Data Recovery unmounts virtual hard drives and consolidates VMs snapshots.
If you have a look at running backups performed by VMware Data Recovery you will find how by default the number of concurrent backup tasks performed is 8. I suppose this number is not casual but it's somehow derived from the max cost algorithm which rule all VMware hosts.
This algorithm basically assigns a cost to every operation performed by hosts and storage in a VMware environment.
The total cost of operations cannot excede the max value of cost each host can withstand.
This value limits the number of concurrent backup tasks that can be running in an ESXi host on which VMware Data Recovery appliance is running.
Returning to my problem...as you might expect this issue was related to the poor performance of the NAS in managing concurrent data writes by 8 backup streams simultaneously.
To adjust the number of concurrent backup jobs it's really simple.
Login to your VMware Data Recovery appliance.
Default username: root
Default password: vmw@re
Stop datarecovery service
service datarecovery stopGo to
/var/vmware/datarecovery/and create a file called
Open this file and paste this content:
In this way we are overriding default values of VMware Data Recovery for the maximum number of concurrent backup/restore jobs. By default this value is 8, minimum for value is (obviously) 1.
Finally restart datarecovery service:
service datarecovery start
Performing only one backup at a time eliminated the concurrent access to the NAS speeding up enormously backup process that in this way are performed sequentially, when a VM backup job finishes the next VM backup job starts.
For more informations on VMware Data Recovery options have a look at this WMware KB.