Backups are very crucial to any organization as they cushion against the complete loss of important data or information. In case of accidental data loss, it can easily be restored or recovered from the earlier created backups.
One should decide what data to backup and how to backup the data before backing up your cluster or replica set. In this blog post, we are going to discuss how to prepare for a MongoDB backup or the items you must consider before starting a MongoDB backup.
Backup Configuration Options
The backup and recovery requirements for any given system varies to meet the performance, cost, and data protection standards the system owner has set. The Ops Manager Enterprise Backup and Recovery supports five backup architectures:
- A file system on one or more NAS devices.
- An AWS S3 Blockstore.
- A file system on a SAN with advanced features for filesystem snapshots, such as high availability, compression, or deduplication.
- MongoDB Blockstore in a highly available configuration.
- MongoDB Blockstore in a standalone configuration.
Each backup architecture has its own strengths and trade-offs. You should choose an architecture that adequately meets the data protection requirements for your deployment before configuring and deploying your backup architecture.
For example, owners with a system whose requirements include low operational costs, may have rigorous limitations on what they can spend on storage for their backup and recovery configuration and hence as a result, may accept a longer recovery time.
On the other hand, owners with a system whose requirements include a low recovery time objective, may endure greater costs of storage if it results in a backup and recovery configuration which meets the recovery requirements.
When To Use a Particular Backup Architecture
- If you run backups frequently on large amounts of data and want to restore from backups, consider backing up to a file system on a SAN, an AWS S3 snapshot store and MongoDB blockstore configured as a replica set or a sharded cluster.
- When restoring data without relying on the MongoDB database, backup to an AWS S3 snapshot store, one or more NAS devices and to a file system on a SAN with advanced features for filesystem snapshots, such as high availability, compression, or deduplication.
- When minimizing internal storage and maintenance costs, backup to an AWS S3 snapshot store or to a MongoDB standalone blockstore. However, you should note that the MongoDB standalone blockstore offers limited resilience and in the scenario that the disk fills, the blockstore may go offline and you can recover snapshots only after adding additional storage.
Backup Sizing Recommendation
When sizing the backup of your data, keep the replica set size to 2 TB or less of uncompressed data. In the scenario that your database has increased beyond 2 TB, consider sharding the database and keep each shard to 2 TB or less of uncompressed data.
The size recommendations are a best practice, and are however, not a limitation of the MongoDB database or Ops Manager. You should note that backup and restore can use large amounts of CPU, memory, storage, and network bandwidth.
Consider the following example: you want to back up a 2 TB database, your host supports a 10 Gbps TCP connection from Ops Manager to its backup storage, and the network connection has very low packet loss and a low round trip delay time.
For a full backup of your data to occur in the above example, it would take more than 30 hours. This, however, does not account for disk read and write speeds, which can be at most 3 Gbps reads and 1 Gbps writes for a single or mirrored NVMe storage device. The time required to complete each successive incremental backup depends on the write load.
Sharding the database into 4 shards results in a backup that takes less than 8 hours to complete. This is far better compared to the initial 30 hours. Sharding takes less time since each shard runs its backup separately.
Snapshot Frequency and Retention Policy
Ops Manager automatically takes a base snapshot of your data after every 24 hours by default and you cannot take snapshots on demand. However, administrators can change the frequency of the base snapshots to 6, 8, 12 or 24 hours.
Base snapshots have a default retention policy of 2 days and a maximum of 5 days or 30 days if the scheduled frequency is 24 hours. Daily and weekly snapshots have a default retention policy of 0 days and 2 weeks respectively with both snapshots having a maximum retention policy of 1 year. Finally, monthly snapshots have a default retention policy of 1 month and a maximum of 7 years.
You should note that changing the reference time changes the time of the next scheduled snapshot as in the following scenarios:
- If the new reference time [11:00 UTC] is before the current time [12:00 UTC], then the next snapshot occurs at the new reference time tomorrow [11:00 UTC tomorrow].
- If the new reference time [15:00 UTC] is after the current reference time [12:00 UTC], but you make the change after the current reference time [13:00 UTC], then the next snapshot will occur at the new reference time the next day [15:00 UTC tomorrow].
- In the scenario where the new reference time [15:00 UTC] is after the current reference time [12:00 UTC], and you happen to make the change before the current reference time [10:00 UTC], then the next snapshot will occur at the new reference time of the same day [15:00 UTC today].
You should note that the Ops Manager does not delete existing snapshots to conform to the new schedule when you change the schedule to save fewer snapshots. On the downside, Ops Manager has some limitations as it does not replicate index collection options and does not backup deployments where the total number of collections on the deployment meets or exceeds 100,000.
Ops Manager can encrypt any backup job stored in a head database running MongoDB Enterprise between FCV 3.4 and 4.0 with the WiredTiger storage engine.
Databases Running Feature Compatibility Version (FCV) 4.2 and 4.4
Backup support for the following MongoDB versions is growing but currently limited and support will be expanded in future releases of Ops Manager:
- 4.2 with featureCompatibilityVersion : 4.2
- 4.4 with featureCompatibilityVersion : 4.4
Ops Manager 4.2.13 or later is required when performing a PIT restore. Ops Manager also requires a full backup for your first backup, after a snapshot has been deleted, and if the blockstore block size has been changed. Incremental backups reduce network transfer and storage costs. This feature works with MongoDB 4.2.6 or later.
Backups to a FCV 4.2 or later database to a File System ignore File System Store Gzip Compression Level. MongoDB Enterprise versions 4.2.9 or 4.4.0 are required for querying encrypted snapshots.
If you are running MongoDB 4.2 or later with FCV 4.2 or later to run backup and restore, you:
- Cannot use namespace filter lists to define the namespaces included in a backup as namespaces are always included in snapshots using FCV 4.2 or later.
- Must deploy a MongoDB Agent with every mongod node in the cluster.
- Must run MongoDB Enterprise.
- Don’t need to sync the database because when taking a snapshot the Ops Manager selects the replica set member with the least performance impact and greatest storage-level duplication of snapshot data.
- Must account for the change in blockstore block size. Often the block size changes from 64 KB to 1 MB if you did not set your block size and rather decided to use the default. This impacts storage use.
Databases not Running Feature Compatibility Version (FCV) 4.2 and 4.4
You should note that only sharded clusters can be backed up and therefore, to back up a standalone mongod process, you must convert it to a single-member replica set.
The following considerations apply when your databases run MongoDB 4.2 with “featureCompatibilityVersion” : 4.0 or when they run any version of MongoDB 4.0 or earlier.
Garbage Collection of Expired Snapshots
Ops Manager manages expired snapshots using groom jobs. Depending upon which snapshot store contains the snapshots, these groom jobs act differently. For example:
- In S3 snapshot stores, the groom jobs may use additional disk space if Ops Manager creates a snapshot while the groom job is running for FCV 4.0 or earlier. Also, for FCV 4.2 or later, Ops Manager cannot create snapshots while a groom job is running.
- In MongoDB Blockstore, the groom job works by also using additional disk space up to the amount of living blocks for each job.
- In Filesystem Snapshot stores, the groom job works by deleting the expired snapshots.
The namespaces filter allows you to specify what databases and collections to backup by creating either a Blacklist of those to exclude or a Whitelist, of those to include. Before starting a backup, you should make your selections which you can later edit. Consider doing a resync after changing the filter in a way that adds data to your backup.
Using the blacklist prevents the backup of collections that contain logging data, caches, or other ephemeral data which consequently allows you to reduce backup time and costs. Using a whitelist requires you to intentionally opt in to every namespace you want backed up. Therefore, this makes the blacklist more preferable.
However, MongoDB deployments with “featureCompatibilityVersion” : 4.2 do not support namespace filters.
You should use the WiredTiger storage engine to backup MongoDB clusters and if your current backing databases use MMAPv1, you should consider upgrading to WiredTiger by:
- Changing the sharded clusters to WiredTiger.
- Changing the replica set to WiredTiger.
Ops Manager limits backups to deployments with fewer than 100,000 files with the WiredTiger storage engine. These files include collections and indexes.
Resyncing Production Deployments
As a best practice, it is recommended that you annually resync all backed-up replica sets for production deployments. When resyncing, data is read from a secondary in each replica set and no new snapshots are generated during resync.
You may also want to resync your backup after:
- A manual build of an index on a replica set in a rolling fashion.
- A reduction in data size, such that the size on disk of Ops Managers copy of the data is also reduced.
- A switch in storage engines, if you want the Ops Manager to provide snapshots in the new storage engine format.
Checkpoints provide additional restore points between snapshots for sharded clusters. Ops Manager creates restoration points at configurable intervals of every 15, 30, or 60 minutes between snapshots with checkpoints enabled.
A checkpoint is created when the Ops Manager stops the balancer and inserts a token into the oplog of each shard and config server in the cluster. Checkpoints are disabled by default and thus they are not required by backup processes.
Restoration from a checkpoint takes longer than restoration from a snapshot since the Ops Manager applies the oplog of each shard and config server to the last snapshot captured before the checkpoint.
For clusters that run MongoDB with Feature Compatibility Version of 4.0 or earlier, you may use checkpoints. For MongoDB instances with FCV of 4.2 or later, checkpoints were removed.
To summarize, in order to prepare MongoDB backups, make use of the backup configuration options, make sure to use an adequate backup architecture and follow all of the recommendations outlined above. By doing so you should be well on your path to preparing some very well thought out MongoDB backups.