Wednesday, July 6, 2016

All About Shading in MongoDB

In this post we will be covering following topics:

  • Customer Stories
  • Sharding for Performance/Scale
    • When to shard?
    • How many Shard do I need
  • Type of Sharding
  • How to pick a Shard Key
  • Sharding for other reasons

Customer Stories:

customer who leverage Sharding is: FOURSQAURE.

Foursquare provide a series of mobile application that enable people to do number of things:

  1.  You can visit various restaurants and Visit places and share them with our people. Learn how people like them.
  2. Another application called swarm, which is a social  networking activity that has geo location capability so that you can use that application to understand where your friends are and meet them.


One of the challenge of FOURSQUARE is:
essentially a large number of users generating high volume. So they have 

  •  50 Million users.
  • Check-in : User can register where they are ( hey I am at starbucks). They have 6 Million check-in per day. and these check-in grow at extremely high rate.
  • 55 million point of interest / venues.
  • 1.7 million merchants using the platform for marketing

Primary reason for FOURSQUARE to shard is at peak time they can be driving as many as 300,000 operations per second. That's way beyond the capacity of single server. So they have sharded so than can have whole range of servers working together.

They have sharded this application:
11 MongoDB Clusters
  - 8 are sharded

Largest cluster has 15 shards (check -ins)
 - Sharded on userid


Customer Stories 2 - CARFAX

CARFAX sells vehicle history report. If you are purchasing a new car or a used car and you want to understand the history of that car, how many times it has been into maintenance, has it been to accident. You can go to CARFAX and purchase car history record.

CarFax cluster need to be sharded just because of the volume of data they have. They have essentially 13 billion documents in there mongodb collection and they are adding 1.5 billion document every year. 1 Vehicle history report is > 200 documents

They have partition those 13 billion document across 12 shards, they have also architected this shard so they have extremely high availability (9 -node replica set). And essentially have Replicas distributed across 3 data centres.




What is Shading?

Structure of sharded system would be like this:


At the top you have your application, and it is developed using one of the driver offered by mongodb.  When you are interacting with a sharding cluster that driver talks to a process called Query Router. The actual application called a mongos. The application talk to Query Router, Query router looks at the query and route the query to appropriate shard. 

So if I am a user on CARFAX and I want to look up at vehicle with particular, I basically send my query from the application to query router, Query router looks at it and figure out which shard contains the data of that vehicle, send the query to that particular shard. That shard retrieve that data and send back to the Query Router and then Query router send it back to the application.

Because of this partitioning of sending individual query to different shard, this enable MongoDB to scale out. If you add more shard you have more servers that are responding and you get significant scalability.

How Sharding work:

You basically shard your data based on a key on a document. Going back to CARFAX example, they use thing as vehicle identification number as there shard key. We can then partition our data set across multiple shard leveraging that shard key.

How do I know if I need to shard:

You want to look at the server that you have available. Whether you are going to shard is depend upon the capacity of your server. If you have server that have 4 GB of RAM and a single spinning drive, you are gonna need to shard much sooner than another server that has 500 GB of RAM.

You have to look at your system and asked yourself set of question and do a number of calculation:
-> Does the single server has enough space to store all my data. If you have 2 TB of space on your server and you know your application will produce 5 TB of data then obviously you probably need to shard.
Server Spec: Disk Capacity

-> Number of Operations/Queries per second (Throughput). You are going to find given the server I have how many operations per second it can handle and what are the peak operation of my server. If your application need more operation per second than your server capacity then you need to shard.
Server Spec: Disk IOPS, RAM, Network

-> Latency: Respond to queries fast enough.
Server Spec: Disk IOPS, RAM, Network

How many Shards to I need?

There are number of thing you need to check: 
-> Disk space: Sum of disk spaces across shards should be greater than required storage size.

Example: 
Storage size  = 3 TB
Server Disk Capacity: 2 TB

At least 2 shards required.

-> RAM
We need to find Working Set, and make sure our RAM is big enough so our entire working set should fit in RAM.
- Sum of RAM across shards should be greater than Working Set

Working Set is actually Indexes plus the set of documents accessed frequently.

Way MongoDB works like any database, in order to process a query it will use the index to find which document are relavant for the query, it will then go to disk and grab those document that it need to return to the application or update them.

In order to be fast, you want those indexes to be in RAM so databse can quickly identify the document that need to be operated on.

In order to reduce the amount of disk IO, since the disk is very significanly slower than RAM and that make more sense to have frequently access documents into RAM.

So Set of document that are frequently access will vary very differently depending upon application. There are certain application like time series data where most people look into document that were loaded within the last 10 minutes.

And then may be other type of application where you do analytics over years worth of data and there are many documents in the working set.

More of the data of your working set present in RAM means: Shorter latency, Higher Throughput
Normal RAM access time is typically microseconds and disk access time is in milliseconds. 

-> Measuring Index size and Working Set:
db.stats() - index size of each collection.
db.serverStatus({workingSet:1}) - working set size estimate.

Fortunately if you are trying to figure out what your working set is, the ideal scenario is to have mongodb application ready already, a prototype or development version of your application roughly what look like production. Then if you have a bunch of documents, then you can use a commands: db.stats() to identify index size of each of your collection. So it give one aspect to identify your working set.
 Another thing is, there is an optional parameter to db.serverStatus shown above this will also calculate the working set. 

Example:
Working set: 428 GB
Each Server RAM: 128 GB
428/128 = 3.34

4 shards Required

-> Input/Output:
Number of IOs that the cluster need in order to support application(Sum of IOPS across shards) is greater than the amount required IOPS {Input Output per second}

Challenge with IOPS is they are difficult to estimate. IO examples:
 Update document 
 Update Index
 Append to Journal
 Log Entry

Example:
Required IOPS: 11000
Server Disk IOPS: 500

3 shards required

Estimating shards

* S = Operations/sec of a Single server
* G = Requires Ops/Sec
* N = # of shards 

G = N * S * .7

N = [G/.7S]

.7 - Sharding overhead. Query Router, Balancer of Sharded cluster etc are overhead to sharding.

Example:
S = 4000
G = 10000

N = 3.57

4 shards required.


Types of Sharding:

  • Range
  • Tag Aware
  • Hashing

Range: Identify a field in the document and then we partition based on that value of that field. In CARFAX we were using vehicle identification number to partition data across shards. FOURSQUARE uses Customer IDs to partition the data.

Diagram of the shard key value space segmented into smaller ranges or chunks.

Tag Aware sharding:
You give a label to each of your shards, (you can have multiple shards that have the same label). Then you associate each one of those label a range of data.

Suppose we are collecting data in mongodb about people clothing purchases and we are going to do analytics based upon the season. 

Analytics over there Spring season purchases and then over summer season purchases etc.

We are going to create a shard based on date and time of their purchases. And in MongoDB you can essentially define a tag ranges as shown below:

Shard Tag                Start             End
Winter 23 Dec 21 Mar
Spring 22 Mar 21 Jun
Summer 20 Jun 23 Sep
Fall 24 Sep 22 Dec

Then when we insert data into MongoDB, it is then going to look at the purchase date time and then use tag definition to put data into their appropriate shard.

Hash Sharding

It is similar to Range, instead of using a value for partition across shard, it take the Hash of that value. What CARFAX could have done is taken their vehicle Identification Number and Hashed that vehicle identification number and used that hashed value as a way to distribute data across the cluster.

Diagram of the hashed based segmentation.

Pros:
 - Evenly distributed writes

Cons:
 - Random data( and index) update (on set of related document as they are now not co-located in single shard like you may with range based sharding) cab be IO intensive.
- Range-based queries turn into scatter gather


How do I pick a Shard Key

Properties of a good shard key:
 -> Sufficient cardinality ( If you have 10 shards in your cluster and your key has only 5 possible values that not going to enable you to create a good distribution of data across shards)
 -> Distributed writes 
 -> targeted reads ("query isolation")
* Shard key should be in every query if possible
 - scatter gather otherwise
* Choosing a good shard key is important!
 - Affects performance and scalability
 - changing it later is expensive

Using Vehicle Identification number is great, because it will make sure related vehicles are grouped together in same shard.

If you pick a wrong shard key, changing a shard key is really expensive. If you change your shard key, all of your data need to be moved across the server. 

Low Cardinality Shard Key:
Shard key that have significanly fewere values. Example Male Female field, that only have two possible value.

It is hard to create meaningful subdivision of that key to distubute data across shards.

Induces "Jumbo chunks"

Ascending shard Key
 Another Mistake people make is picking a Monoyonically increasing shard key values cause "hot spots" on inserts
 Examples: timestamps (Creation date, Transaction date), _id field in MongoDB document

All of the data is going to written to a single shard whichever shard it is. One possible way to handle is if you do want to use creation time then Hash that timestamp field. Then data will be evenly distributed across shards.

Reason to Shard

  • Scale
    •   Data Volume
    •   Query Volume

It is the most common reason why people shard. But there are other ways you can leverage sharding.

  • Global Deployment with local writes.
    • Geography aware sharding.

Often people want to deploy a MongoDB cluster in an environment where they have multiple data centres around the globe, and they have application in each of those geo graphic region, where they would like to read and write locally and not have to access the primary servers across a wide area network.



We have different shard colour coded. One shard in green has primary in New York and its replica is in London and another in Sydney.

Brown shard has primary in London and Secondary in Sydney and New York. Grey shard has primary in Sydney and Secondary in London, New York.

This mean if I have an application running in New York and it is managing the data for the north american customer I can very easily read write though that primary. I will get great performance. If I need to access customer from Europe or Asia I can still do that by reading from a local secondary in my data centre.

Same is true for application in Europe. They can read and write from Primary server that contain the European data. 

Tiered Storage Approach:

  •  To Save hardware cost using Tiered Storage. There are certain class of application out there where you may need to manage many years of data but most of the time the old data, (data older than a week or month or year) is very infrequently access. 

What you can do is setup a sharded cluster, different shard have different spec.
  •  Put frequently accessed document on fast servers
    •  Infrequently accessed documents on less capable servers.
  • Use Tag aware sharding


We have two shard that get tag current and two with tag Archive. On my Current shards I got really big  server, lots of RAMs, SSDs for my IO system so they are steaming fast. And on my Archive shard I got older system that got just spinning disk. 
 I will make sure that most recent data in on Current shard and then the older data on Archive shard. That way if I am accessing the recent data I am going to get it faster.

Primary reason for this is reduced cost. If you are managing 300 TB of data in MongoDB, you are being able to use some low end servers for some of the older data that save a lot of money.

  • Fast Restore

We recently work on a federal agency, where from performance perceptive, they only needed a few shards. They put lot of data on couple of servers and they can easily manage the query volume and throughput, Latency. But the problem was that because each of this server has lot of data on them, if a disaster occurred and they needed to restore from backup, the backup restore process would take an extremely long time, because they essentially have to do two 20 TB restore. And move 20 TB of data across the network which is going to take a long time.

If we increase the number of shards where each shard have only 10TB of data now they can do four 10 TB restore and essentially reduce restore time by 50%.

Q & A

Questions 1. For already running server how to measure operation per seconds.
Answer: Command line tool: mongostat. This will list down read write per second. 
Also highly recommend Mongodb manager server (mms) - It gives you graphical dashboard that shows you performance of your cluster. 

Question 2: How Queries can help in deciding Shard Key?
Answer: If the query doesn't contain the Shard Key, then Query Router will have no idea where to route the query. Then query will be send to all the shards. If i am using vehicle number in shard key then my queries must containt vehicle number.

Question 3: In a small DB with few operation per second, if I have a single replica set or single server will they perform better than a single replica set in a sharded environment.
Answer: Yes. There is some overhead of Sharding. All your queries now need to go through mongos that going to aggregate data from multiple shards, there is a backgroudn process called balances that moving data around shards, Shards essentially need to communicate with each other about which has which data. So if you don't need to shard because your application is small there is no need to do sharding. But when you get to large application then it become essential.

Question 4: Does each collection in the database need to be sharded individually.
Answer: Yes. These Sharding configuration is set up per collection. So it is very common to have some small collection in a database that are not sharded and then just have a few collection which has most of the data which are the one that are sharded. It is possible to use different shard Key or Approaches for each collection.

Question 5: If I am using Tag Aware Sharding, how does application get more control 
Answer: With tag aware sharding you explicitly specify for each tag what range of document belong to that tag, then you assign tag to the shard. So you essentially have your application controlling what range of data goes in each shard. If you use use standard range based sharding, what happen is the background process called balancer in mongodb, will adjust the ranges of each shard and move data from one shard to another in order to keep the data evenly distributed across the shards. Even if I showed a clear division of data, clean range each shard have 0-25, 25-50 etc, over time you will find the ranges are all broken up, data move around into different shards. Whereas if you use Tag aware sharding your application is controlling specifically which in which shard the data should resides.

No comments:

Post a Comment

Mongodb explain() Query Analyzer and it's Verbosity

First creating 1 million documents: > for(i=0; i<100; i++) { for(j=0; j<100; j++) {x = []; for(k=0; k<100; k++) { x.push({a:...