Wednesday, June 22, 2016

Replica Set Components - MongoDB

replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments. This section tells about various components in MongoDB replica set.

Replica Set Primary
 The primary is the only member in the replica set that receives write operations. MongoDB applies write operations on the primary and then records the operations on the primary’s oplog. Secondary members replicate this log and apply the operations to their data sets.
In the following three-member replica set, the primary accepts all write operations. Then the secondaries replicate the oplog to apply to their data sets.
Diagram of default routing of reads and writes to the primary.
All members of the replica set can accept read operations. However, by default, an application directs its read operations to the primary member. See Read Preference post for details on changing the default read behaviour.
The replica set can have at most one primary. [1] If the current primary becomes unavailable, an election determines the new primary. Read Replica Set Elections post for more details.
In the following 3-member replica set, the primary becomes unavailable. This triggers an election which selects one of the remaining secondaries as the new primary.

Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary
[1]In some circumstances, two nodes in a replica set may transiently believe that they are the primary, but at most, one of them will be able to complete writes with { w: "majority" } write concern. The node that can complete { w:"majority" } writes is the current primary, and the other node is a former primary that has not yet recognized its demotion, typically due to a network partition. When this occurs, clients that connect to the former primary may observe stale data despite having requested read preference primary, and new writes to the former primary will eventually roll back.

Replica Set Secondary Members
A secondary maintains a copy of the primary’s data set. To replicate data, a secondary applies operations from the primary’s oplog to its own data set in an asynchronous process. A replica set can have one or more secondaries.

The following three-member replica set has two secondary members. The secondaries replicate the primary’s oplog and apply the operations to their data sets.
Diagram of a 3 member replica set that consists of a primary and two secondaries.
Although clients cannot write data to secondaries, clients can read data from secondary members. See Read Preference for more information on how clients direct read operations to replica sets.
A secondary can become a primary. If the current primary becomes unavailable, the replica set holds an election to choose which of the secondaries becomes the new primary.
In the following three-member replica set, the primary becomes unavailable. This triggers an election where one of the remaining secondaries becomes the new primary.
Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary
You can configure a secondary member for a specific purpose. You can configure a secondary to:
  • Prevent it from becoming a primary in an election, which allows it to reside in a secondary data center or to serve as a cold standby. (Priority 0 Replica Set Members).
  • Prevent applications from reading from it, which allows it to run applications that require separation from normal traffic. (Hidden Replica Set Members).
  • Keep a running “historical” snapshot for use in recovery from certain errors, such as unintentionally deleted databases. (Delayed Replica Set Members).

Replica Set Arbiter

An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a vote in elections of for primary. Arbiters always have exactly 1 vote election, and thus allow replica sets to have an uneven number of members, without the overhead of a member that replicates data.
IMPORTANT
Do not run an arbiter on systems that also host the primary or the secondary members of the replica set.
Only add an arbiter to sets with even numbers of members. If you add an arbiter to a set with an odd number of members, the set may suffer from tied elections.
Example For example, in the following replica set, an arbiter allows the set to have an odd number of votes for elections:
Diagram of a four member replica set plus an arbiter for odd number of votes.

Hidden Replica Set Members

A hidden member maintains a copy of the primary’s data set but is invisible to client applications. Hidden members are good for workloads with different usage patterns from the other members in the replica set. Hidden members must always be priority 0 members and so cannot become primary. The db.isMaster() method does not display hidden members. Hidden members, however, may vote in elections.
In the following five-member replica set, all four secondary members have copies of the primary’s data set, but one of the secondary members is hidden.
Diagram of a 5 member replica set with a hidden priority 0 member.

Behaviour

Read OperationsClients will not distribute reads with the appropriate read preference to hidden members. As a result, these members receive no traffic other than basic replication. Use hidden members for dedicated tasks such as reporting and backups. Delayed members should be hidden.

In a sharded cluster, mongos do not interact with hidden members.
VotingHidden members may vote in replica set elections. If you stop a voting hidden member, ensure that the set has an active majority or the primary will step down.
For the purposes of backups,
  • If using the MMAPv1 storage engine, you can avoid stopping a hidden member with thedb.fsyncLock() and db.fsyncUnlock() operations to flush all writes and lock the mongod instance for the duration of the backup operation.
  • Changed in version 3.2: db.fsyncLock() can ensure that the data files do not change for MongoDB instances using either the MMAPv1 or the WiredTiger storage engines, thus providing consistency for the purposes of creating backups.
To configure a secondary member as hidden, set its members[n].priority value to 0 and set its members[n].hidden value to true in its member configuration:
{
  "_id" : <num>
  "host" : <hostname:port>,
  "priority" : 0,
  "hidden" : true
}

Configuration Procedure

The following example hides the secondary member currently at the index 0 in the members array. To configure a hidden member, use the following sequence of operations in a mongo shell connected to the primary, specifying the member to configure by its array index in the members array:
cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
rs.reconfig(cfg)
Replica Set Oplog
The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then copy and apply these operations in an asynchronous process. All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database.
To facilitate replication, all replica set members send heartbeats (pings) to all other members. Any member can import oplog entries from any other member.
Whether applied once or multiple times to the target dataset, each operation in the oplog produces the same results, i.e. each operation in the oplog is idempotent. For proper replication operations, entries in the oplog must be idempotent:
  • initial sync
  • post-rollback catch-up
  • sharding chunk migrations

Read Preference

Read preference describes how MongoDB clients route read operations to the members of a replica set.
Read operations to a replica set. Default read preference routes the read to the primary. Read preference of ``nearest`` routes the read to the nearest member.
By default, an application directs its read operations to the primary member in a replica set.

Read Preference Modes

IMPORTANT
All read preference modes except primary may return stale data because secondaries replicate operations from the primary with some delay. [1] Ensure that your application can tolerate stale data if you choose to use a non-primary mode.
MongoDB drivers support five read preference modes.
Read Preference ModeDescription
primaryDefault mode. All operations read from the current replica set primary.
primaryPreferredIn most situations, operations read from the primary but if it is unavailable, operations read from secondary members.
secondaryAll operations read from the secondary members of the replica set.
secondaryPreferredIn most situations, operations read from secondary members but if no secondary members are available, operations read from the primary.
nearestOperations read from member of the replica set with the least network latency, irrespective of the member’s type.

No comments:

Post a Comment

Mongodb explain() Query Analyzer and it's Verbosity

First creating 1 million documents: > for(i=0; i<100; i++) { for(j=0; j<100; j++) {x = []; for(k=0; k<100; k++) { x.push({a:...