blog

Replica Set Data Synchronization After Restoring a MongoDB Backup

Onyancha Brian Henry

Published January 27, 2021

Database backup is a crucial undertaking when ensuring data safety and availability in an event of database destruction at any time. In other words, backups may provide a better foundation for business continuity in any case of failure. Besides, some government regulations insist on companies making backups on crucial information such as financial and health records.

In the case of a replica set, backups play a major part in restoring the replica set through resynchronization of the primary data first. Besides, resynchronization helps to keep replica set members’ data up-to-date.

In this blog we also discuss how to resync a backup to ensure that its data is up to date with the current database status.

Backup Resynchronization

Cloud Manager alerts the client to resync the backup for a specified MongoDB instance if it detects that the latest backup is out of sync with the MongoDB deployment. Some of the hints that you may get pushing to this situation are:

Data being corrupted: If the deployment daemon detects a job break as a result of an illegal instruction which may lead to replication, it will request for a resync to the backup
Unsafe applyOps: If a missing document copy in the backup is indicated, then it might request for a resync of the backup
Rollover of the Oplog: In many cases, backup resync is triggered by the oplog rolling over and the Backup’s tailing cursor not being able to catch up with the deployment’s oplog.
Secondary falls far behind the primary. This is also as a result of the oplog rollover and if resynching is not done, the backups will never be inline.

MongoDB does not provide an automatic way to resolve any of the situations above hence the only recommended way is to resync the backup manually.

Considerations for Backup Resynchronization

In the production environment, it is advisable to resync all backups manually.
Ensure the Backup oplog never falls behind the deployment’s oplog by providing the agent with sufficient machine resources and following any maintenance or downtime, the Cloud manager should be restarted in a timely manner.
Resync the head database in case you have created an index in the rolling fashion to make the database take into account the new index.
Ensure the primary’s oplog is large enough to accommodate at least 24hours of activity as a way of providing a buffer for maintenance and occasional activity bursts.

Initial Sync After Backup Restoration

During the restoration of a backup, we normally start by establishing the primary node first in the case of a replica set and populate it with the data. After setting the secondary members into place, we copy all data from the primary into these members through initial sync.

How MongoDB Performs an Initial Sync

Clones all databases except the primary’s database. This process helps mongod to scan every collection in the source database and insert all data into its own copies of these collections. In version 3.4, collection indexes are built as documents are copied for each collection contrary to the early versions whereby only the _id indexes were built. Besides, the operation pulls lately added records from the oplog and copies them together with the data. Ensure to have enough disk space for the primary so that the oplog is able to record changes during the copying process.
Using the oplog from source, MongoDB applies the changes to the data set so that it reflects the most recent state of the replica set.

How to Perform an Initial Sync

MongoDB provides two ways of performing an initial sync:

Automatically Sync a Member: You can restart the mongod of the secondary member with an empty data directory. In this case, MongoDB’s normal syncing feature restores the data into the member but the process may take longer than expected. This process is much simpler. You need to consider that this process exerts additional traffic to the primary hence can impact its performance during this time. The length of the initial sync is dependent on the database size and network latency between the members of the replica set.
Sync by Copying Data files from the primary: You can copy data of the primary node into the directories of the secondary members and then restart the machines. This process is quite fast in copying the data but at the cost of more manual steps to be carried out. Only snapshot backup can be used with this process since it has a consistent snapshot for a running mongod instance.

Initial Sync for an Existing Replica Set Deployment

In this scenario, one can choose a sync source from where to perform an initial sync which must not necessarily be the primary. In MongoDB 4.4, one can specify the initial sync source using the initialSyncSourceReadPreference option during startup of the mongod instance.

The sync source must pass this conditions:

Be a primary or secondary in the replication state
Must be online and reachable
If the initialSyncSourceReadPreference is set to secondary, the sync source must be secondary

This is a relaxed criteria that is used to select the sync source in case a first pass fails to select a member.

The primary is always the default initial sync source unless chaining is disabled

Fault Tolerance During the Sync Process

Any secondary performing an initial sync and happens to encounter a non-transient network error during the process will have to restart the syncing process from the beginning.

In the case of MongoDB 4.4, the secondary member can attempt to resume the process if interrupted by a temporary network error, collection rename or drop. This means that the sync source must also be running on MongoDB 4.4 in order to enhance resumable initial sync, otherwise, the whole process has to restart.

In a default setup, the secondary will try to resume initial sync for 24 hours and if it fails in 10 attempts, the process will throw a fatal error.

The 24 hours value can be changed using the initialSyncTransientErrorRetryPeriodSeconds option which determines the amount of time the secondary should attempt to resume the initial sync. If it fails to resume with the process, it will select another healthy source from the replica set which is up-to-date and restart the initial sync from the beginning.

In the case of Streaming replication, MongoDB 4.4 provides oplogFetcherUsesExhaust parameter which can be used to disable this replication strategy in case of resource constraints hence use the old replication behaviour. However, you may consider to uplift the resources since this replication strategy provides advantages such as:

Mitigating replication lag in high-load and high latency networks.
Reducing latency on write operations with w: “majority” and w:>1
Reduces stateless for reads from the secondaries
Reduces risk of losing write operations with w:1 as a result of primary failover.

By setting the oplogFetcherUsesExhaust to false, MongoDB will be limited from using network bandwidth for replication.

MongoDB 4.2 provides a flowControlTargetLagSeconds parameter that can be used to limit the rate at which the primary applies its writes so that the majority committed lag may be under configurable maximum value.

After a member has successfully performed an initial sync, its state will transform to PRIMARY or SECONDARY.

Conclusion

Initial sync of replica sync set members is important in ensuring data consistency and integrity across all the members. When performing an initial sync, one should be very considerate on the sync source member and its disk space in order to keep the changes that may happen during the syncing process. Depending on the resource limitation one can either save on the operational cost or trade off with advantages associated with streaming replication.