Backup Advanced techniques

Disaster Recovery

The Disaster

To classify a problem as a Disaster, one or more of the following must happen:

  • Hardware failure of one or more vital filesystems (such as / or /opt/zextras/)

  • Contents of a vital filesystem made unusable by internal or external factors (like a careless rm * or an external intrusion)

  • Hardware failure of the physical machine hosting the Carbonio service or of the related virtualization infrastructure

  • A critical failure on a software or OS update/upgrade

Minimizing the Chances

Some suggestions to minimize the chances of a disaster:

  • Always keep vital filesystems on different drives (namely / /opt/zextras/, or your Carbonio Backup Path)

  • Use a monitoring/alerting tool for your server to become aware of problems as soon as they appear

  • Carefully plan your updates and migrations

The Recovery

How to Recover Your System

The recovery of a system is divided into 2 steps:

  • Base system recovery (OS installation and configuration, Carbonio installation and base configuration)

  • Data recovery (reimporting the last available data to the Carbonio server, including domain and user configurations, COS data and mailbox contents)

How can Carbonio Backup Help with Recovery?

The Import Backup feature of Carbonio Backup provides an easy and safe way to perform step 2 of a recovery.

Using the old server’s backup path as the import path allows you to restore a basic installation of Carbonio to the last valid moment of your old server.

The Recovery Process

  • Install Carbonio on a new server and configure the Server and Global settings.

  • Install Carbonio on the new server.

  • Mount the backup folder of the old server onto the new one. If this is not available, use the last external backup available or the latest copy of either.

  • Begin an External Restore on the new server using the following CLI command:

    zxsuite backup doExternalRestore /path/to/the/old/store
    
  • The External Restore operation will immediately create the domains, accounts and distribution lists, so as soon as the first part of the Restore is completed (check your Carbonio Notifications), the system will be ready for your users. Emails and other mailbox items will be restored afterwards.

Settings and Configs

Server and Global settings are backed up but are not restored automatically. Carbonio Backup’s high-level integration with Carbonio allows you to restore your data to a server with a different OS/Carbonio Release/Networking/Storage setup without any constraints other than the minimum Carbonio version required.

Whether you wish to create a perfect copy of the old server or just take a cue from the old server’s settings to adapt those to a new environment, Carbonio Backup comes with a very handy CLI command:

# zxsuite backup getServerConfig
command getServerConfig requires more parameters


Syntax:
   zxsuite backup getServerConfig {standard|customizations} [attr1 value1 [attr2 value2...
Usage example
zxsuite backup getserverconfig standard date last

Display the latest backup data for Server and Global configuration.

zxsuite backup getserverconfig standard file /path/to/backup/file

Display the contents of a backup file instead of the current server backup.

zxsuite backup getserverconfig standard date last query zimlets/com_zimbra_ymemoticons colors true verbose true

Displays all settings for the com_zimbra_ymemoticons zimlet, using colored output and high verbosity.

zxsuite backup getServerConfig standard backup_path /your/backup/path/ date last query / | less

Display the latest backed up configurations

Advanced usage

Change the query argument to display specific settings

zxsuite backup getServerConfig standard date last backup_path /opt/zextras/backup/ng/ query serverConfig/zimbraMailMode/test.example.com

config date_______________________________________________________________________________________________28/02/2014 04:01:14 CET
test.example.com____________________________________________________________________________________________________________both

Use the verbose true parameter to show more details; for example, that the /opt/zextras/conf/ and /opt/zextras/postfix/conf/ directories are backed up as well

# zxsuite backup getServerConfig customizations date last verbose true
ATTENTION: These files contain the directories /opt/zextras//conf/ and /opt/zextras/postfix/conf/ compressed into a single archive.
Restore can only be performed manually. Do it only if you know what you're doing.

archives
   filename                                                    customizations_28_02_14#04_01_14.tar.gz
   path                                                        /opt/zextras/backup/ng/server/
   modify date                                                 28/02/2014 04:01:14 CET

VMs and Snapshots

Thanks to the advent of highly evolved virtualization solutions in the past years, virtual machines are now the most common way to deploy server solutions such as Carbonio.

Most hypervisors feature customizable snapshot capabilities and snapshot-based VM backup systems. In case of a disaster, it’s always possible to roll back to the latest snapshot and import the missing data using the External Restore feature of Carbonio Backup - using the server’s backup path as the import path.

Disaster Recovery from a Previous VM State

Snapshot-based backup systems allow you to keep a frozen copy of a VM in a valid state and rollback to it at will. To 100% ensure data consistency, it’s better to take snapshot copies of switched off VMs, but this is not mandatory.

Warning

When using these kinds of systems, it’s vital to make sure that the Backup Path isn’t either part of the snapshot (e.g. by setting the vdisk to Independent Persistent in VMWare ESX/i) or altered in any way when rolling back in order for the missing data to be available for import.

To perform a disaster recovery from a previous machine state with Carbonio Backup, you need to:

  • Restore the last valid backup into a separate (clone) VM in an isolated network, making sure that users can’t access it and that both incoming and outgoing emails are not delivered.

  • Switch on the clone and wait for Carbonio to start.

  • Disable Carbonio Backup’s RealTime Scanner.

  • Connect the Virtual Disk containing the untampered Backup Path to the clone and mount it (on a different path).

  • Start an External Restore using the Backup Path as the Import Path.

Doing so will parse all items in the Backup Path and import the missing ones, speeding up the disaster recovery. These steps can be repeated as many time as needed as long as user access and mail traffic is inhibited.

After the restore is completed, make sure that everything is functional and restore user access and mail traffic.

Hint

At the end of the operation, you can check that the configuration of the new mailbox is the same by running the command zxsuite config dump (See the full reference <zextras_config_full_cli>).

The Aftermath

Should you need to restore any content from before the disaster, just initialize a new Backup Path and store the old one.

Unrestorable Items

How can I check if all of my items have been restored?

It’s very easy. Check the appropriate Operation Completed notification you received as soon as the restore operation finished. It can be viewed in the Notifications section of the Administration Console, and it’s also emailed to the address you specified in the Core section of the Administration Console as the Notification E-Mail recipient address.

The skipped items section contains a per-account list of unrestored items, like shown by the following excerpt:

[...]
- stats -
Restored Items: 15233
Skipped Items:  125
Unrestored Items: 10

- unrestored items -
account: account1@example.com
unrestored items: 1255,1369

account: account2@example.com
unrestored items: 49965

account: account14@example.com
unrestored items: 856,13339,45200, 45655
[...]

In the above excerpt, we denote:

Skipped items

An item that has already been restored, either during the current restore or in a previous one.

Unrestored items

An item that has not been restored due to an issue in the restore process.

Why have some of my items not been restored?

There are different possible causes, the most common of which are:

Read Error

Either the raw item or the metadata file is not readable due to an I/O exception or a permission issue.

Broken item

Both the the raw item or the metadata file are readable by Carbonio Backup but their content is broken/corrupted.

Invalid item

Both the the raw item or the metadata file are readable and the content is correct, but Carbonio refuses to inject the item.

How Can I Identify Unrestored Items?

There are two ways to do so: via the CLI and via the Administration Console. The first way can be used to search for the item within the backup/import path, and the second can be used to view the items in the source server.

Using the Administration Console

The comma separated list of unrestored items displayed in the Operation Complete notification can be used as a search argument in the Administration Console to perform an item search.

To do so:

  • Log into the Administration Console in the source server

  • Use the View Mail feature to access the account containing the unrestored items

  • In the search box, enter item: followed by the comma separated list of itemIDs, for example: item: 856,13339,45200,45655

Warning

Remember that any search is executed only within the current tab, so if you are running the search from the Email tab and get no results try to run the same search in the other tabs, e.g., Address Book, Calendar, Tasks.

Using the CLI

The getItem <zxsuite_backup_getItem> CLI command can display an item and the related metadata, extracting all information from a backup path/external backup.

The syntax of the command is:

zxsuite backup getItem {account} {item} [attr1 value1 [attr2 value2...
Usage example

zxsuite backup getItem account2@example.com 49965 dump blob true

Extract the raw data and metadata information of the item whose itemID is 49965 belonging to account2@example.com ,also including the full dump of the item’s BLOB

How Can I Restore Unrestored Items?

An item not being restored is a clear sign of an issue, either with the item itself or with your current Carbonio setup. In some cases, there are good chances of being able to restore an item even if it was not restored on the first try.

In the following paragraphs, you will find a collections of tips and tricks that can be helpful when dealing with different kinds of unrestorable items.

Items Not Restored because of a Read Error

A dutiful distinction must be done about the read errors that can cause items not to be restored:

Hard errors

Hardware failures and all other destructive errors that cause an unrecoverable data loss.

Soft errors

non-destructive errors, including for example wrong permissions, filesystem errors, RAID issues (e.g.: broken RAID1 mirroring), and so on.

While there is nothing much to do about hard errors, you can prevent or mitigate soft errors by following these guidelines:

  • Run a filesystem check.

  • If using a RAID disk setup, check the array for possible issues (depending on RAID level).

  • Make sure that the zextras user has r/w access to the backup/import path, all its subfolders and all thereby contained files.

  • Carefully check the link quality of network-shared filesystems. If link quality is poor, consider transferring the data with rsync.

  • If using SSHfs to remotely mount the backup/import path, make sure to run the mount command as root using the -o allow_other option.

Items Not Restored because Identified as Invalid Items

An item is identified as Invalid when, albeit being formally correct, is discarded by the LMTP Validator upon injection.

If you experienced a lot of unrestored items during an import, it might be a good idea to momentarily disable the LMTP validator and repeat the import:

  • To disable the LMTP Validator, run the following command as the zextras user:

    zmlocalconfig -e zimbra_lmtp_validate_messages=false
    
  • Once the import is completed, you can enable the LMTP validator by running:

    zmlocalconfig -e zimbra_lmtp_validate_messages=true
    

Warning

This is a dirty workaround, as items deemed invalid by the LMTP validator might cause display or mobile synchronisation errors. Use at your own risk.

Items Not Restored because Identified as Broken Items

Unfortunately, this is the worst category of unrestored items in terms of salvageability.

Based on the degree of corruption of the item, it might be possible to recover either a previous state or the raw object (this is only valid for emails). To identify the degree of corruption, use the getItem <zxsuite_backup_getItem> CLI command:

zxsuite backup getItem {account} {item} [attr1 value1 [attr2 value2...
Example of how to restore an item

To search for a broken item, setting the backup_path parameter to the import path and the date parameter to all, will display all valid states for the item:

# zxsuite backup getItem admin@example.com 24700 backup path /mnt/import/ date all
     itemStates
             start date                                                  12/07/2013 16:35:44
             type                                                        message
             deleted                                                     true
             blob path /mnt/import/items/c0/c0,gUlvzQfE21z6YRXJnNkKL85PrRHw0KMQUqo,pMmQ=
             start date                                                  12/07/2013 17:04:33
             type                                                        message
             deleted                                                     true
             blob path /mnt/import/items/c0/c0,gUlvzQfE21z6YRXJnNkKL85PrRHw0KMQUqo,pMmQ=
             start date                                                  15/07/2013 10:03:26
             type                                                        message
             deleted                                                     true
             blob path /mnt/import/items/c0/c0,gUlvzQfE21z6YRXJnNkKL85PrRHw0KMQUqo,pMmQ=

If the item is an email, you will be able to recover a standard .eml file through the following steps:

  1. Identify the latest valid state

    From the above snippet, consider:

    /mnt/import/items/c0/c0,gUlvzQfE21z6YRXJnNkKL85PrRHw0KMQUqo,pMmQ=
                 start_date                                                  15/07/2013 10:03:26
                 type                                                        message
                 deleted                                                     true
                 blob path /mnt/import/items/c0/c0,gUlvzQfE21z6YRXJnNkKL85PrRHw0KMQUqo,pMmQ=
    
  2. Identify the blob path

    Take the blob path from the previous step:

    blob path /mnt/import/items/c0/c0,gUlvzQfE21z6YRXJnNkKL85PrRHw0KMQUqo,pMmQ=
    
  3. Use gzip to uncompress the BLOB file into an .eml file

    # gunzip -c /mnt/import/items/c0/c0,gUlvzQfE21z6YRXJnNkKL85PrRHw0KMQUqo,pMmQ= > /tmp/restored.eml
    
    # cat /tmp/restored.eml
    
    Return-Path: carbonio@test.example.com
    
    Received: from test.example.com (LHLO test.example.com) (192.168.1.123)
    by test.example.com with LMTP; Fri, 12 Jul 2013 16:35:43 +0200 (CEST)
    
    Received: by test.example.com (Postfix, from userid 1001) id 4F34A120CC4;
    Fri, 12 Jul 2013 16:35:43 +0200 (CEST)
    To: admin@example.com
    From: admin@example.com
    Subject: Service mailboxd started on test.example.com
    Message-Id: <20130712143543.4F34A120CC4@test.example.com>
    Date: Fri, 12 Jul 2013 16:35:43 +0200 (CEST)
    
    Jul 12 16:35:42 test zmconfigd[14198]: Service status change: test.example.com mailboxd changed from stopped to running
    
  4. Done! You can now import the .eml file into the appropriate mailbox using your favorite client.

Taking Additional and Offsite Backups of Carbonio Backup’s Datastore

Having backup systems is a great safety measure against data loss, but each backup system must be part of a broader backup strategy to ensure the highest possible level of reliability. The lack of a proper backup strategy gives a false sense of security, while actually turning even the best backup systems in the world into yet another breaking point.

Devising a backup strategy is no easy matter, and at some point you will most likely be confronted with the following question: “What if I lose the data I backed up?”. The chances of this happening ultimately only depend on how you make and manage your backups. For example, it’s more likely that you will lose all of your backed up data if you store both your data and your backups in a same, single SATA-II disk than if you store your backed up data on a dedicated SAN using a RAID 1+0 setup.

Here are some suggestions and best practices to improve your backup strategy by making a backup of the Backup NG’s datastore and storing it offsite.

Making an Additional Backup of Carbonio Backup’s Datastore

In order to minimise the possible loss of data, a backup can take advantage of the well-known database properties called ACID, that guarantee data validity and integrity.

ACID properties

A set of database operations that satisfy the following four properties is called a transaction and represent a single logical unit of work. A transaction guarantees the logical consistency of the data stored and, in the context of Carbonio Backup, it allows for easy data back-up and roll-back to a previous state in case of serious database problems.

Atomicity

Any transaction is committed and written to the disk only when completed.

Consistency

Any committed transaction is valid, and no invalid transaction will be committed and written to the disk.

Isolation

All transactions are executed sequentially so that no more than 1 transaction can affect the same item at once.

Durability

Once a transaction is committed, it will stay so even in case of a crash (e.g. power loss or hardware failure).

By respecting these properties, it’s very easy to make a backup of the Datastore and make sure of the content’s integrity and validity. The best (and easiest) way to do so is by using the rsync software, designed around an algorithm that only transfers deltas (i.e., what actually changed) instead of the whole data, and works incrementally. Specific options and parameters depend on many factors, such as the amount of data to be synced and the storage in use, while connecting to an rsync daemon instead of using a remote shell as a transport is usually much faster in transferring the data.

You won’t need to stop Carbonio or the Real Time Scanner to make an additional backup of Carbonio Backup’s datastore using rsync, and, thanks to the ACID properties, you will be always able to stop the sync at any time and reprise it at a later point.

Storing Your Carbonio Backup’s Datastore Backup Offsite

As seen in the previous section, making a backup of Carbonio Backup’s Datastore is very easy, and the use of rsync makes it just as easy to store your backup in a remote location.

To optimize your backup strategy when dealing with this kind of setup, the following best practices are recommended:

  • If you schedule your rsync backups, make sure that you leave enough time between an rsync instance and the next one in order for the transfer to be completed.

  • Use the --delete options so that files that have been deleted in the source server are deleted in the destination server to avoid inconsistencies.

    • If you notice that using the --delete option takes too much time, schedule two different rsync instances: one with --delete to be run after the weekly purge and one without this option.

  • Make sure you transfer the whole folder tree recursively, starting from Carbonio Backup’s Backup Path. This includes server config backups and mapfiles.

  • Make sure the destination filesystem is case sensitive (just as Backup NG’s Backup Path must be).

  • If you plan to restore directly from the remote location, make sure that the zextras user on your server has read and write permissions on the transferred data.

  • Expect to experience slowness if your transfer speed is much higher than your storage throughput (or vice versa).

Additional/Offsite Backup F.A.Q.

Why shouldn’t I use the Export Backup feature of Carbonio Backup instead of rsync?

For many reasons:

  • The Export Backup feature is designed to perform migrations. It exports a snapshot that is an end in itself and was not designed to be managed incrementally. Each time an Export Backup is run, it’ll probably take just as much time as the previous one, while using rsync is much more time-efficient.

  • Being a Carbonio Backup operation, any other operation started while the Export Backup is running will be queued until the Export Backup is completed

  • An Export Backup operation has a higher impact on system resources than an rsync

  • Should you need to stop an Export Backup operation, you won’t be able to reprise it, and you’ll need to start from scratch

Can I use this for Disaster Recovery?

Yes. Obviously, if your Backup Path is still available. it’s better to use that, as it will restore all items and settings to the last valid state. However, should your Backup Path be lost, you’ll be able to use your additional/offsite backup.

Can I use this to restore data on the server the backup copy belongs to?

Yes, but not through the External Restore operation, since item and folder IDs are the same.

The most appropriate steps to restore data from a copy of the backup path to the very same server are as follows:

  • Stop the RealTime Scanner

  • Change the Backup Path to the copy you wish to restore your data from

  • Run either Restore on New Account or a Restore Deleted Account.

  • Once the restore is over, change the backup path to the original one.

  • Start the RealTime Scanner. A SmartScan will be triggered to update the backup data.

Can I use this to create an Active/Standby infrastructure?

No, because the External Restore operation does not perform any deletions. By running several External Restores, you’ll end up filling up your mailboxes with unwanted content, since items deleted from the original mailbox will not be deleted on the standby server.

The External Restore operation has been designed so that accounts will be available for use as soon as the operation is started, so your users will be able to send and receive emails even if the restore is running.

Are there any other ways to do an Additional/Offsite backup of my system?

There are for sure, and some of them might even be better than the one described here. These are just guidelines that apply to the majority of cases.

Operation Queue and Queue Management

Carbonio Backup’s Operation Queue

Every time a Carbonio Backup operation is started, either manually or through scheduling, it is enqueued in a dedicated, unprioritized FIFO queue. Each operation is executed as soon as any preceding operation is dequeued (either because it has been completed or terminated).

The queue system affects the following operations:

  • External backup

  • All restore operations

  • SmartScan

Changes to Carbonio Backup's configuration are not enqueued and are applied immediately.

Operation Queue Management

Via the Administration Console
  • Viewing the Queue

    To view the operation queue, access the Notifications tab in the Administration Console and click the Operation Queue button.

    Warning

    The Administration Console displays operations queued both by Carbonio Backup and Zextras Powerstore in a single view. This is just a design choice, as the two queues are completely separate, meaning that one Carbonio Backup operation and one Zextras Powerstore operation can be running at the same time.

  • Emptying the Queue

    To stop the current operation and empty Carbonio Backup’s operation queue, enter the |backup| tab in the Administration Console and click the Stop all Operations button.

Through the CLI
  • Viewing the Queue

    To view Carbonio Backup’s operation queue, use the getAllOperations command::

    zxsuite help backup getAllOperations
    
    Usage example

    zxsuite backup getAllOperations

    Shows all running and queued operations

  • Emptying the Queue

    To stop the current operation and empty Carbonio Backup’s operation queue, use the doStopAllOperations command:

    # zxsuite help backup doStopAllOperations
    
    Usage example

    zxsuite backup doStopAllOperations

    Stops all running operations

  • Removing a Single Operation from the Queue

    To stop the current operation or to remove a specific operation from the queue, use the doStopOperation command:

    # zxsuite help backup doStopOperation
    
    Usage example

    zxsuite backup doStopOperation 30ed9eb9-eb28-4ca6-b65e-9940654b8601

    Stops operation with id = 30ed9eb9-eb28-4ca6-b65e-9940654b8601

COS-level Backup Management

COS-level Backup Management allows the administrator to disable ALL Carbonio Backup functions for a whole Class of Service to lower storage usage.

How to disable the Carbonio Backup Component for a COS

  • The Real Time Scanner will ignore all accounts in the COS.

  • The Export Backup function WILL NOT EXPORT accounts in the COS.

  • Accounts in the COS will be treated as Deleted by the backup system. This means that after the data retention period expires, all data for such accounts will be purged from the backup store. Re-enabling the backup for a Class of Service will reset this.

How to save the backup enabled | disabled status

Disabling the backup for a Class of Service will add the following marker to the Class of Service’s Notes field: ${ZxBackup_Disabled}

While the Notes field remains fully editable and usable, changing or deleting this marker will re-enable the backup for the COS.