This section introduces the main concepts needed to understand the architecture of Zextras Backup and outlines their interaction; each concept is then detailed in a dedicated section.
Before entering in the architecture of Zextras Backup, we recall two general approaches that are taken into account when defining a backup strategy: RPO and RTO.
The Recovery Point Objective (RPO) is the highest amount of data that a stakeholder is willing to loose in case of a disaster, while the Recovery Time Objective (RTO) is the highest amount of time that a stakeholder is willing to wait to recover its data.
According to these definitions, the ideal acceptable value zero, while the realistic values are usually near zero, depending on the size of the data. In Zextras, the combination of Real Time Scan and SmartScan guarantees that both RTO and RPO values are quite low: The Real Time Scanner ensures that all metadata changes are recorded as soon as they change, while the SmartScan copies all items that have been modified, hence the possible loss of data is minimised and usually limited to those items that have changed between two consecutive run on SmartScan.
The whole architecture of Zextras Backup revolves around the concept of ITEM: An item is the minimum object that is stored in the backup, for example:
an email message
a contact or a group of contacts
a Drive document
an account (including its settings)
a distribution list
a class of services (COS)
|the last three items (distribution lists, domains, classes of services) are subject to the SmartScan only, i.e., the Real Time Scan will not record any change of their state.|
There are also objects that are not items, and as such will never be scanned for changes by the Real Time Scan and will never be part of a restore:
Server settings, i.e., the configuration of each server
Global settings of Zextras product
Any customizations made to the software (Postfix, Jetty, etc…)
For every item managed by Zextras Suite, every variation in its associated metadata is recorded and saved, allowing its restore at a given point in time. In other words, whenever one of the metadata associated with an item changes, a "photograph" of the whole item is taken and stored with a timestamp be means of a transaction. Examples of metadata associated to an item include:
when the email was read, deleted, moved to a folder
a change in the name/address/job of a contact
the deletion or addition of a file in a folder
the change of status of an item (e.g, an account)
Technically, an item is stored as a JSON Array containing all changes in the item’s lifetime. More about this in the Structure of an Item section.
A Deleted Item is an item that has been marked for removal.
|An element in the thrash bin is not considered as a deleted item: It is a regular item, placed in a folder that is special only to us, from the Zextras Backup’s point of view, the item has only changed its state when moved to the thrash bin.|
A Transaction is a change of state of an item. With change of state we mean that one of the metadata associated with an item is modified by a user. Therefore, a Transaction can be seen as a photography of the metadata in a moment in time. Each transaction is uniquely identified by a Transaction ID. It is possible to restore an item to any past transaction. See more in section Zextras Backup Restore Strategies.
The initial structure of the backup is built during the Initial Scan, performed by the SmartScan: the actual content of a Mailbox is read and used to populate the backup. The SmartScan is then executed at every start of the module and on a daily basis if the Scan Operation Scheduling is enabled in the Administration Zimlet.
|SmartScan runs at a fixed time—that can be configured—on a daily basis and is not deferred. This implies that, if for any reason (like e.g., the server is turned off, or Zextras is not running), SmartScan does not run, it will not run until the next day. You may however configure the Backup to run the SmartScan every time Zextras Suite is restarted (although this is discouraged), or you may manually run SmartScan to compensate for the missing run.|
SmartScan’s main purpose is to check for items modified since its previous run and to update the database with any new information.
The Real Time Scan records live every event that takes place on the system, allowing for a possible recovery with a split-second precision. The Real Time Scanner does not overwrite any data in the backup, so every item has an own complete history. Moreover, it has the ability to detect there are more changes that relate to the same item in the same moment and record all them as a single metadata change.
Both SmartScan and Real Time Scan are enabled by default. While both can be (independently) stopped, it is suggested to leave them running, as they are intended to complement each other.
|If none of the two Scan Operations is active, no backup is created.|
Backups are written on disk, therefore the Scan operations result in I/O disk access. Therefore, there are a number of scenarios in which either of the SmartScan or Real Time Scan might (or should) be disabled, even temporarily. For example:
You have a high number of trasactions every day (or you often work with Drive documents) and notice a high load in the server’s resource consumption. In this case you can temporarily disable the Real Time Scan.
You start a migration: In this case it is suggested to stop the SmartScan, because it would create a lot of I/O operations on disk and even block the server. Indeed, it would treat every migrated or restored item as a new one.
You have a high traffic of incoming and outgoing emails per day. In this case, you should always have the Real Time Scan active, because otherwise all transactions will be backed up only by the SmartScan, which might not be able to complete in a reasonable time, due to the resources required for the I/O operations.
The backup path is the place on a filesystem where all the information about
the backup and archives is stored. Each server has exactly one backup path; different
servers can not share the same backup path. It is structured as a hierarchy of
folders, the topmost of which is by default
this directory, the following important files and directories are present:
map_[server_ID]are so-called map files, that show if the Backup has been imported from an external backup and contain in the filename the unique ID of the server.
accountsis a directory under which information of all accounts defined in the Mailbox are present. In particular, the following important files and directories can be found there:
account_infois a file that stores all metadata of the account, including password, signature, preferences
account_statis a file containing various statistics about the account, like for example the ID of the last element stored by SmartScan
backupstatis a file that maintains generic statistics about the backup, including the timestamp of the first run
drive_itemsis a directory containing up to 256 subfolders (whose name is composed of two hexadecimal lowercase letters), under which are stored Drive items, according to the last two letters of their UUID
itemsis a directory containing up to 100 subfolders (whose name is composed of two digits, in which items are stored according to their ID’s last two digits
serversis a directory that contains archives of the server configuration and customisations, Zextras configuration and of the chat, one per day up to the configured server retention time.
itemsis a directory containing up to 4096 additional folders, whose name consists of two hexadecimal (uppercae and lowercase) characters. Items in the Mailbox will be stored in the directory whose name has the last two characters of their ID.
id_mapper.logis a user object ID mapping and contains a map between the original object and the restored object. It is located at
/backup/zextras/accounts/xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/id_mapper.log. This file is present only in case of an external restore.
|File storage and object storage share must not be used for the Backup Path. Check section [backup-external-storage] for more information.|
A more in-depth and comprehensive overview of the Backup Path.
The Backup Path can be set both via GUI and via CLI:
Via GUI: in the "Backup" section of the Zextras Administration Zimlet, under "Backup Path".
Via CLI: using the config server command to change the
|Backup paths are unique and not reusable. Copying a Backup Path to a new server and setting it as its current Backup Path will return an error, and forcing this in any way by tampering with the backup file will cause corruption of both old and new backup data.|
The Retention Policy (also retention time) defines after how many days an object marked for deletion is actually removed from the backup. The retention policies in the Backup are:
Data retention policy concerns the single items, defaults to 30 days
Account retention policy refers to the accounts, defaults to 30 days
All retention times can be changed; if set to 0 (zero), archives will be kept forever (infinite retention) and the Backup Purge will not run.
In case an account is deleted and must be restored after the Data retention time has expired, it will be nonetheless possible to recover all items up to the Account retention time, because in that case, even if all the metadata have been purged, the digest can still contain the information required to restore the item.
The Backup Purge is a cleanup operation that removes from the Backup Path any deleted item that exceeded the retention time defined by the Data Retention Policy and Account retention policy.
The Coherency Check is specifically designed to detect corrupted metadata and BLOBs and performs a deeper check of a Backup Path than SmartScan.
While the SmartScan works incrementally by only checking items modified since the last SmartScan run, the Coherency Check carries out a thorough check of all metadata and BLOBs in the Backup Path.
To start a Coherency Check via the CLI, use the doCoherencyCheck command:
A detailed analysis of the Coherency Check
Zextras Backup has been designed to store each and every variation of an ITEM. It is not intended as a system or Operating System backup, therefore it can work with different OS architecture and Zimbra versions.
Zextras Backup allows administrators to create an atomic backup of every item in the mailbox account and restore different objects on different accounts or even on different servers.
By default, the default Zextras Backup setting is to save all backup
files in the local directory
/opt/zimbra/backup/zextras/. In order
to be eligible to be used as the Backup Path, a directory must:
Be both readable and writable by the
Use a case sensitive filesystem
Be hosted on a block device
|You can modify the default setting by using either technique shown in Setting the Backup Path.|
When first started, Zextras Backup launches a SmartScan, to fetch from the mailbox all data and create the initial backup structure, in which every item is saved along with all its metadata as a JSON array on a case sensitive filesystem. After the first start, either the Real Time Scanner, the SmartScan, or both can be employed to keep the backup updated and synchronised with the account.
The basic structure of the item is a JSON Array that records all the changes happening during the lifetime of each item, such as information related to emails (e.g., tags, visibility, email moved to a folder), contacts, tasks, single folders, groups, or drive documents, user’s preferences (e.g., hash of the password, general settings).
To improve performance, only the changes that are needed to restore the items are recorded: for example is not useful to store the user’s last login time or the IMAP and Activesync state, because if the account will be restored on a new one, the values of that attributes would be related to the old account.
By collecting the timestamp of the transaction, we are able to restore data at a specific moment of its life.
During the restore, the engine looks at all the transactions valid evaluating the “start-date” and “end-date” attributes.
The same logic is used to retrieve deleted items: when an item is deleted we store the timestamp and so, we are able to restore items that have been deleted within a specific time frame.
Even if the blob associated to the item changes, and consequently its digest changes too (as happens for Drive Document), the metadata records the validity of the old and the new digest.