How cleanup works and why it might not delete snapshots older than the defined retention in CPM 3.0 and above

How cleanup works and why it might not delete snapshots older than the defined retention in CPM 3.0 and above

Background

This KB Article will explain how the cleanup process works and why sometimes cleanup might not delete old snapshots 

It includes following sections:
  1. About the cleanup process
  2. Why Cleanup might not delete old snapshots
  3. Examples

About the cleanup process

The way cleanup works is that it runs periodically as defined under general setting, when you click 'run ASAP' for a policy or when clicking on 'cleanup now' in the cleanup tab:

  

When it runs it checks how many successful backups records there are and compares that with the configured generation to save in the relevant policy.
  

N2WS removes the snapshots related to the backup records in accordance with defined retention policies, so only snapshot older then the generation to save will be removed.

However, only successful backups records are counted towards the retention check (when deciding if cleanup is needed), 
And N2WS will cleanup records older then the oldest successful record that is still in retention

This is to make sure that you have enough successful backups as defined in your policies.
 
It is crucial to understand what a "successful backup" is.
A "successful backup" is a backup that has both completed backup and replication/DR of all its snapshots without issues.
If a single snapshot in a policy that backups 100 volumes fails, this backup won't count.
If this continues, you may end up with a larger than expected amount of snapshots in your account and a heavier bill from AWS as a result.

Note about initial AMI's
If you a have backup target configured as 'snapshots with initial AMI' (default for Windows),
Then the initial AMI is rotated based on the configuration in General settings -> Cleanup tab -> Rotate Single AMIs.
It is important to understand that when an AMI is rotated, then the old AMI is not automatically deleted, It will be deleted when the depended snapshots are cleaned up according to retention. So if you have 4 backup records between the old AMI and new one, the old AMI will be deleted when those 4 backup records are cleaned up.


Why Cleanup might not delete old snapshots

By default, only backups with a status "Backup Successful" are counted towards retention:
  

You can change this behavior by changing the "backup is successful when" setting in the "more options" screen of a policy,
Change it to "Snapshot successful with possible VSS or script Failure.". 
  

Then click the save button.
  

If you'll select the second option, backups that are "Partially Successful" due to VSS will now count as well:


Please keep in mind, that if you choose to ignore the output of scripts/VSS, you may end up with snapshots that are only crash-consistent instead of Application consistent.
It is recommended to resolve your problems with scripts/VSS instead of ignoring them.

Sometimes Partially Successful backups may be caused by N2WS trying to backup instances that do not exist anymore:
Go under Policy details and select the drop-down next to "Auto Target Removal " and change the setting to Yes and click on the Save button. 
  

Another reason for not being counted toward the retention is it might be a failed DR:

You need to resolve the issue that is causing the DR to fail or to disable the DR.

Examples


Example 1:
Here's an actual example that will explain the Generations to save better:

CPM will cleanup out of retention snapshots (successful or failed) only when it has enough successful backups as you configured in your policy's retention (Generations to save).
Let's assume you have a daily policy that runs on 2 instances and is supposed to save 3 generations per instance so you expect to have 6 snapshots on EC2 after it ran 3 days and on, right?

On day 1 it runs and succeeds on both instances = 2 snapshots. Successful backup
On day 2 it runs but only the first instance is successfully backed up so now you have 3 snapshots. This is not considered as a successful backup.
On day 3 it runs again but only the first instance is successfully backed up so now you have 4 snapshots. This is not considered as a successful backup.
On day 4 it runs and succeeds = 6 snapshots (none is deleted because you only have 2 successful backups, not 3). Successful backup
On day 5 it runs and succeeds = 8 snapshots (none is deleted because you only have 3 successful backups) - on this day you'll pay for extra 2 snapshots. Successful backup
Cleanup process will still not clean the failed backups since they're still within the retention period.
On day 6 it runs and succeeds = 6 snapshots (day 1,2 and 3 snapshots are deleted). Here, cleanup will clean the older successful ones as well as the failed ones.
The good backups don't necessarily have to be one after the other but as long as you have failed backups between successful backups, the failed ones will not get deleted.

Example 2:
Lets assume we have the retention set to save 4 backups and we have the following backup history in the backup monitor (from oldest to newest, all for the SAME policy) :

Backup record 1: Backup of target instances failed
Backup record 2: Backup of target instances  succeeded 
Backup record 3: Backup of target instances  succeeded
Backup record 4: Backup of target instances succeeded but DR of one of the instances failed
Backup record 5: Backup and DR succeeded
Backup record 6: Backup and DR succeeded

Now lets assume the next time the backup run for that policy, the backup and DR will again be successful -  
Backup record 7: Backup and DR succeeded
What will happen is that the next time the cleanup process runs, it will clean some backup records as we have now 5 successful backup records.
It will clean records 1 & 2 as both are out of the retention, but it will not clean records 4(which failed DR) until record 5 is out of retention and cleaned.


Thanks for reading this KB,
N2WS Support Team.