Thursday, February 18, 2016

Data Modelling in MongoDb


In this post we are going to discuss Data Modelling patterns in MongoDB. How we design the data, how we design the document, how we represent data in MongoDB in application percepective that we are going to discuss in this post. It will presents the different strategies that you can choose from when determining your data model, their strengths and their weaknesses.

Data modelling or data in MongoDB has a very flexible structure, it has dynamic schema, it is schema less. It has very flexible schema design. Unlike SQL where first we have to design the Database, schema, table then we can insert the data, but in MongoDB we can on the go insert document, insert collection, decide the relationship between the documents, we can embed document, we can reference document. There are many tool in MongoDB which can be used to design our data and do data modelling in MongoDB with those tools.

MongoDB collection do not enforce document structure and this flexibility facilitates the mapping of document to an entity or an object. We can match a document to a data field or we can map to an object. So it is so flexible and whenever we are designing the application key challange is data modeling, balancing the needs of application as per the data.

So Document Structure is very essential in Data Modeling and Key of Data Modeling in MongoDB is Structure of Documents and Relationship between the data. How we structure the data, how we relate the data, how we put the relationship within the data that is what going to be the key in Data Modeling in MongoDB.

For data model we have two tools:
  • References
  • Embedding the data
Tools to represent the Relationship: 
1. References: Using reference like we have primary key or foreing key in relational database, the same way we have references. We can reference another document in a document. So one document can be referenced in another document. This is called Referencing.

2. Embedded Data:
We can embed a document inside a second document. We can embed a document inside a document. So there will be Embedded document in collection.

So these are the way we can achieve Data Modelling in MongoDB.

Data Modelling Using References

What are references:
Normalized data models describe relationships using references between documents.
In Relational Databases like Mysql, Oracle we create two tables, then to refer one table to another we use foreign key. In the same way in MongoDb we can use a references and we can reference another document in a document. Like one collection will have one document and inside that document we can reference the another document. So lets see how we can do that. 

Suppose I have an Employee Document, Structure of which is as:
{
 id: <objectId_1>,
 emply_name: "Sunil",
designation: "Trainer"
}

We have another document to store address details of Employees.
Address Document:
{
  id:<objectId_2>,
  emply_id:<objectId_1>,
  city: "Banglore"
  Country:"India"
}

This document is having id, the same id which mongodb create for each document, then it has emply_id key which is refering to objectId_1, this shows it is refering to Employee Document and then it has city and country. So when you want to get data of Employee in Address Document, you can refer emply_id which will refer to Employee Document. 

In the same way we have Contact Document:
{
id:<objectId_3>,
emply_id:<ObjectId_1>,
mobile_no:9837272245,
email_id:"karworla@gmail.com"
}

This is same as foreign key concept in RBDMS, we are referencing with the objectId. So this id of Employee Document is being stored as emply_id in Address Document and Contact Document. So with this we can find which address belongs to which employee. 

Another Example:
Data model using references to link documents. Both the ``contact`` document and the ``access`` document contain a reference to the ``user`` document.

In general, use normalized data models:
  • when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
  • to represent more complex many-to-many relationships.
  • to model large hierarchical data sets.
References provides more flexibility than embedding. However, client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more round trips to the server.

Data Modelling Using Embedded Data

With MongoDB, you may embed related data in a single structure or document. Embedded documents capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document. These schema are generally known as “denormalized” models, and take advantage of MongoDB’s rich documents. 

 Consider the following diagram:
Data model with embedded fields that contain all related information.

Embedded data models allow applications to store related pieces of information in the same database record. As a result, applications may need to issue fewer queries and updates to complete common operations.

Suppose we have a document which has id, emply_name and Designation and address as:
{
id:<ObjectId_1>,
emply_name: "Sunil",
designation:"Trainer"

These three key value pairs are there. Now we want to add the address data as well. But Address will have further value like city, Country. So this will donee by embedding one document inside a document.

{
id:<ObjectId_1>,
emply_name: "Sunil",
designation:"Trainer"
address:{ city:"Banglore",
 Country:"India"
}
-- Similary we can embedd another document:
contact:{
mobile_no:9837272245;
email_id:"karworla@gmail.com"
}
}
So these two documents 'address' and 'contact' are embedded in the main document.

In general, embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. Embedded data models make it possible to update related data in a single atomic write operation.

However, embedding related data in documents may lead to situations where documents grow after creation. With the MMAPv1 storage engine, document growth can impact write performance and lead to data fragmentation.

Atomicity of Write Operations

In MongoDB, write operations are atomic at the document level, and no single write operation can atomically affect more than one document or more than one collection. A denormalized data model with embedded data combines all related data for a represented entity in a single document. This facilitates atomic write operations since a single write operation can insert or update the data for an entity. Normalizing the data would split the data across multiple collections and would require multiple write operations that are not atomic collectively.
However, schemas that facilitate atomic writes may limit ways that applications can use the data or may limit ways to modify applications. The Atomicity Considerations documentation describes the challenge of designing a schema that balances flexibility and atomicity.

1 comment:

Mongodb explain() Query Analyzer and it's Verbosity

First creating 1 million documents: > for(i=0; i<100; i++) { for(j=0; j<100; j++) {x = []; for(k=0; k<100; k++) { x.push({a:...