Sunday, February 21, 2016

MongoDB - Working with Relationships

Relationships in MongoDB represent how various documents are logically related to each other. Relationships can be modeled via Embedded andReferencedapproaches. Such relationships can be either 1:1, 1: N, N: 1 or N: N.


In previous posts we have seen Data modelling in MongoDB and we have seen how we can perform data modelling using Referencing of Documents or Embedding document in MongoDB.

In this post we will be discussing about relationship between documents and relationship in MongoDB. Like we maintain relationship in Traditional Databases  one-to-one, one-to-many, many-to-many, the same one-to-one, one-to-many, many-to-many we will see how these can be achieved in MongoDB.

There are three types of Relationship:
  • One-to-One (1:1) Relationship
  • One-to-Many (1:N) Relationship
  • One-to-One (M:N) Relationship

The understanding of these relationships, mappings between documents, collections is important to learn to make Design and Develop Application in MongoDB.

One-to-One (1:1) Relationship

In Relationship database we map one table to another table with the help of foreign key, then we do the mapping and sometimes we keep the data in the same table and we do the one to one mapping. It all depend on how efficient you want to design your database.


In MongoDB we implement these Relationship using two strategies, one is Embedding and other is Linking ( Referencing).
Suppose I want to achieve one to one relation in MongoDB. I have a Customer document in MongoDB and I have a same Address document for the same customer, and I want to achieve one to one relationship between customer and Address document.

Suppose Customer information is as:
{
  name: "Anil Sharma"
  business: "Telecom"
}

Address information in other document is as:
{
  street:"10 Andheri East"
  city: "Mumbai"
  state: "Maharashtra"
}

Now there are two ways by which these information can be stored in MongoDB. First is Embedding these two document in one document and another one is to have two document and I can reference one in another via foreign key strategy.

Handing one-to-one using Embedding Document:
The above situation can be handled by embedding the document into another document as:
{
  name: "Anil Sharma"
  business: "Telecom"

address:{
  street:"10 Andheri East"
  city: "Mumbai"
  state: "Maharashtra"
       }
}

Address document has been embedded inside Customer document. This way we can achieve one-to-one mapping, one customer one address. 

Handing one-to-one using Referencing Document:
Suppose we have customer document as:
{
   _id: 101,
   name: "Anil Sharma"
  business: "Telecom"
}

Another, Address document is as:
{
  street:"10 Andheri East"
  city: "Mumbai"
  state: "Maharashtra"
}

Along with street, city, state information in Address document, which we had in previous made address document, we have one more column/field or key-value pair cust_id and that customer id is referring to this id of this customer field. Customer and Address document are now getting together with this id field. In this way we can refer one document in another. This way we can refer one-to-one relationship in MongoDB.  

One-to-Many (1:N) Relationship:

Suppose I have a Student document which is having information about Student details. Another document with Subjects, which will contain information on which student has which subjects. One student can have many subjects.

So We need to achieve one-to-many relationship. Student ---- (1:N)---- Subjects.

Student document has following details:
{
name:"Anil Sharma"
school:"DPS"
}

Subjects Document is as:
{
  Teacher: "John Carry"
  Subject:"Science"
},
{
  Teacher: "Mark T"
  Subject:"Biology"

}  

We need to maintain one to many relationship between these two documents using Embedding and Referencing methods.

Handing one-to-many using Embedding Document:
As we did in one-to-one mapping, again we will have subjects document inside Student document.

We will write Student document with field "name" and "school" as previous document, now we will have another field "subjects" array and this subject array will contain document for each subject.
{
  name:"Anil Sharma",
  school: "DPS",
  subjects:[{Teacher:"John Carry"
    Subject: "Science"
            },{
    Teacher: "Mark T"
    subject:"Biology"
            }]

This way one student can have many subjects. If we want to subject information with student it can be achieved in a single query.

Handing one-to-many using Referencing:
Now we will try to achieve one-to-many relationship using Referencing. Suppose we have student document with _id, name and school fields as:
{
   _id:101,
  name:"Anil Sharma",
  school:"DPS"
}

Now I have subject document with fields, Teacher name and Subject name along with student ID (std_id) field. 

  std_id:101,
  Teacher: "Mark T"
  subject:"Biology"
}

{
  std_id:101,
  Teacher:"John Carry",
  Subject: "Science"
}

This way we will create subject document for each student. We are achieving one-to-many relationship, one student is having two or more subjects. This is like foreign key relationship.


Another example of one-to-many relationship, consider the following example that maps patron and multiple address relationships. The example illustrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-many relationship between patron and address data, the patron has multiple address entities.
In the normalized data model, the address documents contain a reference to the patron document.
{
   _id: "joe",
   name: "Joe Bookreader"
}

{
   patron_id: "joe",
   street: "123 Fake Street",
   city: "Faketon",
   state: "MA",
   zip: "12345"
}

{
   patron_id: "joe",
   street: "1 Some Other Street",
   city: "Boston",
   state: "MA",
   zip: "12345"
}
If your application frequently retrieves the address data with the name information, then your application needs to issue multiple queries to resolve the references. A more optimal schema would be to embed theaddress data entities in the patron data, as in the following document:
{
   _id: "joe",
   name: "Joe Bookreader",
   addresses: [
                {
                  street: "123 Fake Street",
                  city: "Faketon",
                  state: "MA",
                  zip: "12345"
                },
                {
                  street: "1 Some Other Street",
                  city: "Boston",
                  state: "MA",
                  zip: "12345"
                }
              ]
 }
With the embedded data model, your application can retrieve the complete patron information with one query.

Many-To-Many (M:N) Relationship 

 In Relation Database systems like oracle, mysql, we achieve this using joins. We join two or more tables to achieve many-to-many relationship.

Suppose we have Author document and Article document. One Author can write multiple article and one Article can be written by one or more Authors. So there is a Many to Many relationship between Author and Article.  
Author-----M:N:---- Article

Two way Embedding for Handling many-to-many relationship:

Suppose there are two authors who have written articles. "Author" document is as:
{ id:1,
  name: "Anil Sharma",
  articles:[1,2]
}
{ id:2,
  name: "John K",
  articles:[2]
}

Article 2 is common here in both the Author.  

Similarly Article Document. It has id, title, categories and authors as:
{
  id: 1,
  title: "MongoDB Design",
  categories:["Nosql"],
  authors:[1,2]
}
{
  id:2,
  title:"RDBMS Intro",
  categories:["Relational Database"],
  authors:[2]
}

This is two way embedding. We are embedding Articles in author document and embedding Authors in Article document. One Author has many article and one article has many author. So in Author document we are referring Article, and in Article document we are referring Authors.

Thursday, February 18, 2016

Data Modelling in MongoDb


In this post we are going to discuss Data Modelling patterns in MongoDB. How we design the data, how we design the document, how we represent data in MongoDB in application percepective that we are going to discuss in this post. It will presents the different strategies that you can choose from when determining your data model, their strengths and their weaknesses.

Data modelling or data in MongoDB has a very flexible structure, it has dynamic schema, it is schema less. It has very flexible schema design. Unlike SQL where first we have to design the Database, schema, table then we can insert the data, but in MongoDB we can on the go insert document, insert collection, decide the relationship between the documents, we can embed document, we can reference document. There are many tool in MongoDB which can be used to design our data and do data modelling in MongoDB with those tools.

MongoDB collection do not enforce document structure and this flexibility facilitates the mapping of document to an entity or an object. We can match a document to a data field or we can map to an object. So it is so flexible and whenever we are designing the application key challange is data modeling, balancing the needs of application as per the data.

So Document Structure is very essential in Data Modeling and Key of Data Modeling in MongoDB is Structure of Documents and Relationship between the data. How we structure the data, how we relate the data, how we put the relationship within the data that is what going to be the key in Data Modeling in MongoDB.

For data model we have two tools:
  • References
  • Embedding the data
Tools to represent the Relationship: 
1. References: Using reference like we have primary key or foreing key in relational database, the same way we have references. We can reference another document in a document. So one document can be referenced in another document. This is called Referencing.

2. Embedded Data:
We can embed a document inside a second document. We can embed a document inside a document. So there will be Embedded document in collection.

So these are the way we can achieve Data Modelling in MongoDB.

Data Modelling Using References

What are references:
Normalized data models describe relationships using references between documents.
In Relational Databases like Mysql, Oracle we create two tables, then to refer one table to another we use foreign key. In the same way in MongoDb we can use a references and we can reference another document in a document. Like one collection will have one document and inside that document we can reference the another document. So lets see how we can do that. 

Suppose I have an Employee Document, Structure of which is as:
{
 id: <objectId_1>,
 emply_name: "Sunil",
designation: "Trainer"
}

We have another document to store address details of Employees.
Address Document:
{
  id:<objectId_2>,
  emply_id:<objectId_1>,
  city: "Banglore"
  Country:"India"
}

This document is having id, the same id which mongodb create for each document, then it has emply_id key which is refering to objectId_1, this shows it is refering to Employee Document and then it has city and country. So when you want to get data of Employee in Address Document, you can refer emply_id which will refer to Employee Document. 

In the same way we have Contact Document:
{
id:<objectId_3>,
emply_id:<ObjectId_1>,
mobile_no:9837272245,
email_id:"karworla@gmail.com"
}

This is same as foreign key concept in RBDMS, we are referencing with the objectId. So this id of Employee Document is being stored as emply_id in Address Document and Contact Document. So with this we can find which address belongs to which employee. 

Another Example:
Data model using references to link documents. Both the ``contact`` document and the ``access`` document contain a reference to the ``user`` document.

In general, use normalized data models:
  • when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
  • to represent more complex many-to-many relationships.
  • to model large hierarchical data sets.
References provides more flexibility than embedding. However, client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more round trips to the server.

Data Modelling Using Embedded Data

With MongoDB, you may embed related data in a single structure or document. Embedded documents capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document. These schema are generally known as “denormalized” models, and take advantage of MongoDB’s rich documents. 

 Consider the following diagram:
Data model with embedded fields that contain all related information.

Embedded data models allow applications to store related pieces of information in the same database record. As a result, applications may need to issue fewer queries and updates to complete common operations.

Suppose we have a document which has id, emply_name and Designation and address as:
{
id:<ObjectId_1>,
emply_name: "Sunil",
designation:"Trainer"

These three key value pairs are there. Now we want to add the address data as well. But Address will have further value like city, Country. So this will donee by embedding one document inside a document.

{
id:<ObjectId_1>,
emply_name: "Sunil",
designation:"Trainer"
address:{ city:"Banglore",
 Country:"India"
}
-- Similary we can embedd another document:
contact:{
mobile_no:9837272245;
email_id:"karworla@gmail.com"
}
}
So these two documents 'address' and 'contact' are embedded in the main document.

In general, embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. Embedded data models make it possible to update related data in a single atomic write operation.

However, embedding related data in documents may lead to situations where documents grow after creation. With the MMAPv1 storage engine, document growth can impact write performance and lead to data fragmentation.

Atomicity of Write Operations

In MongoDB, write operations are atomic at the document level, and no single write operation can atomically affect more than one document or more than one collection. A denormalized data model with embedded data combines all related data for a represented entity in a single document. This facilitates atomic write operations since a single write operation can insert or update the data for an entity. Normalizing the data would split the data across multiple collections and would require multiple write operations that are not atomic collectively.
However, schemas that facilitate atomic writes may limit ways that applications can use the data or may limit ways to modify applications. The Atomicity Considerations documentation describes the challenge of designing a schema that balances flexibility and atomicity.

Tuesday, February 16, 2016

MongoDB - Mastering Find Command

You can use the find() method to issue a query to retrieve data from a collection in MongoDB. All queries in MongoDB have the scope of a single collection.
Queries can return all documents in a collection or only the documents that match a specified filter or criteria. You can specify the filter or criteria in a document and pass as a parameter to the find() method.
The find() method returns query results in a cursor, which is an iterable object that yields documents.
Field Selection

db.collection.find(queryprojection)
Selects documents in a collection and returns a cursor to the selected documents. 
ParameterTypeDescription
querydocumentOptional. Specifies selection criteria using query operators. To return all documents in a collection, omit this parameter or pass an empty document ({}).
projectiondocumentOptional. Specifies the fields to return using projection operators. To return all fields in the matching document, omit this parameter.
Returns:A cursor to the documents that match the query criteria. When the find() method “returns documents,” the method is actually returning a cursor to the documents.

Before we jump into cursors, you should know that find takes a second optional parameter called “projection”. This parameter is the list of fields we want to retrieve or exclude. 

The projection parameter takes a document of the following form:
{ field1: <boolean>, field2: <boolean> ... }
The <boolean> value can be any of the following:
  • 1 or true to include the field. The find() method always includes the _id field even if the field is not explicitly stated to return in the projection parameter.
  • 0 or false to exclude the field.
By default, the _id field is always returned. We can explicitly exclude it by specifying {name:1, _id: 0}.
Aside from the _id field, you cannot mix and match inclusion and exclusion. If you think about it, that actually makes sense. You either want to select or exclude one or more fields explicitly.  A projection cannot contain both include and exclude specifications, except for the exclusion of the _id field. In projections that explicitly include fields, the _id field is the only field that you can explicitly exclude.

For example, we can get all of the unicorns’ names without getting back other fields by executing:
db.unicorns.find({}, {name: 1});
> db.unicorns.find({},{name:1}).pretty()
{ "_id" : ObjectId("56bf377510a4febbd2830cfc"), "name" : "Horny" }
{ "_id" : ObjectId("56bf377510a4febbd2830cfd"), "name" : "Aurora" }
{ "_id" : ObjectId("56bf377510a4febbd2830cfe"), "name" : "Unicrom" }
{ "_id" : ObjectId("56bf377510a4febbd2830cff"), "name" : "Roooooodles" }
{ "_id" : ObjectId("56bf377710a4febbd2830d00"), "name" : "Solnara" }

> db.unicorns.find({},{name:1, _id:0}).pretty()
{ "name" : "Horny" }
{ "name" : "Aurora" }
{ "name" : "Unicrom" }
{ "name" : "Roooooodles" }
{ "name" : "Solnara" }

> db.unicorns.find({},{name:1, gender:1, _id:0}).pretty()
{ "name" : "Horny", "gender" : "m" }
{ "name" : "Aurora", "gender" : "f" }
{ "name" : "Unicrom", "gender" : "m" }
{ "name" : "Roooooodles", "gender" : "m" }
{ "name" : "Solnara", "gender" : "f" }

> db.unicorns.find({},{_id:0,loves:0,gender:0, dob:0}).pretty()
{ "name" : "Horny", "weight" : 600, "vampires" : 63 }
{ "name" : "Aurora", "weight" : 450, "vampires" : 41 }
{ "name" : "Unicrom", "weight" : 988.6799999999998, "vampires" : 182 }
{ "weight" : 600, "name" : "Roooooodles", "vampires" : 99 }
{ "name" : "Solnara", "weight" : 550, "vampires" : 80 }

There’s more to find than understanding selectors though. We already mentioned that the result from find is a cursor. We’ll now look at exactly what this means in more detail.

Explicitly Excluded Fields

The following operation queries the bios collection and returns all fields except the first field in the nameembedded document and the birth field:
db.bios.find(
   { contribs: 'OOP' },
   { 'name.first': 0, birth: 0 })

On Arrays and Embedded Documents

The following operation queries the bios collection and returns the last field in the name embedded document and the first two elements in the contribs array:
db.bios.find(
   { },
   {
     _id: 0,
     'name.last': 1,
     contribs: { $slice: 2 }
   }
)

Find All Documents in a Collection

The find() method with no parameters returns all documents from a collection and returns all fields for the documents. For example, the following operation returns all documents in the bios collection:
db.unicorns.find()

Find Documents that Match Query Criteria

To find documents that match a set of selection criteria, call find() with the <criteria> parameter. The following operation returns all the documents from the collection products where qty is greater than 25:
db.products.find( { qty: { $gt: 25 } } )
> db.unicorns.find({weight:{$gt:700}}).pretty()
{
        "_id" : ObjectId("56bf377510a4febbd2830cfe"),
        "name" : "Unicrom",
        "dob" : ISODate("1973-02-09T16:40:00Z"),
        "loves" : [
                "energon",
                "redbull"
        ],
        "weight" : 988.6799999999998,
        "gender" : "m",
        "vampires" : 182
}
{
        "_id" : ObjectId("56bf378810a4febbd2830d01"),
        "name" : "Ayna",
        "dob" : ISODate("1998-03-07T03:00:00Z"),
        "loves" : [
                "strawberry",
                "lemon"
        ],
        "weight" : 734.56,
        "gender" : "f",
        "vampires" : 40
}

Query for Equality

The following operation returns documents in the bios collection where _id equals 5:
db.bios.find( { _id: 5 } )

Query Using Operators

The following operation returns documents in the bios collection where _id equals either 5 orObjectId("507c35dd8fada716c89d0013"):
db.bios.find(
   {
      _id: { $in: [ 5,  ObjectId("507c35dd8fada716c89d0013") ] }
   }
)


Query for Ranges

Combine comparison operators to specify ranges. The following operation returns documents with fieldbetween value1 and value2:
db.collection.find( { field: { $gt: value1, $lt: value2 } } );

Combine Conditions

You can combine multiple query conditions in logical conjunction (AND) and logical disjunctions (OR).

Logical AND
You can specify a logical conjunction (AND) for a list of query conditions by separating the conditions with a comma in the conditions document.

db.restaurants.find( { "cuisine": "Italian", "address.zipcode": "10075" } )
The result set includes only the documents that matched all specified criteria.

Logical OR
You can specify a logical disjunction (OR) for a list of query conditions by using the $or query operator.

db.restaurants.find(
   { $or: [ { "cuisine": "Italian" }, { "address.zipcode": "10075" } ] }
)

Query a Field that Contains an Array

If a field contains an array and your query has multiple conditional operators, the field as a whole will match if either a single array element meets the conditions or a combination of array elements meet the conditions.
Given a collection students that contains the following documents:
{ "_id" : 1, "score" : [ -1, 3 ] }
{ "_id" : 2, "score" : [ 1, 5 ] }
{ "_id" : 3, "score" : [ 5, 5 ] }
The following query:
db.students.find( { score: { $gt: 0, $lt: 2 } } )
Matches the following documents:
{ "_id" : 1, "score" : [ -1, 3 ] }
{ "_id" : 2, "score" : [ 1, 5 ] }
In the document with _id equal to 1, the score: [ -1, 3 ] meets the conditions because the element-1 meets the $lt: 2 condition and the element 3 meets the $gt: 0 condition.
In the document with _id equal to 2, the score: [ 1, 5 ] meets the conditions because the element meets both the $lt: 2 condition and the $gt: 0 condition.

Query Arrays

Query for an Array Element

The following operation returns documents in the bios collection where the array field contribs contains the element "UNIX":
db.bios.find( { contribs: "UNIX" } )
> db.unicorns.find({loves:"watermelon"}).pretty()
{
        "_id" : ObjectId("56bf378810a4febbd2830d04"),
        "name" : "Leia",
        "dob" : ISODate("2001-10-08T09:23:00Z"),
        "loves" : [
                "apple",
                "watermelon"
        ],
        "weight" : 601,
        "gender" : "f",
        "vampires" : 33
}
{
        "_id" : ObjectId("56bf378810a4febbd2830d05"),
        "name" : "Pilot",
        "dob" : ISODate("1997-02-28T23:33:00Z"),
        "loves" : [
                "apple",
                "watermelon"
        ],
        "weight" : 650,
        "gender" : "m",
        "vampires" : 54
}
{
        "_id" : ObjectId("56bf378910a4febbd2830d07"),
        "name" : "Dunx",
        "dob" : ISODate("1976-07-18T12:48:00Z"),
        "loves" : [
                "grape",
                "watermelon"
        ],
        "weight" : 705.56,
        "gender" : "m",
        "vampires" : 165
}

Query an Array of Documents

The following operation returns documents in the bios collection where awards array contains an embedded document element that contains the award field equal to "Turing Award" and the year field greater than 1980:
db.bios.find(
   {
      awards: {
                $elemMatch: {
                     award: "Turing Award",
                     year: { $gt: 1980 }
                }
      }
   }
)
The name field must match the embedded document exactly. The query does not match documents with the following name fields:
{
   first: "Yukihiro",
   aka: "Matz",
   last: "Matsumoto"
}

{
   last: "Matsumoto",
   first: "Yukihiro"
}

Query Fields of an Embedded Document

The following operation returns documents in the bios collection where the embedded document name contains a field first with the value "Yukihiro" and a field last with the value "Matsumoto". The query uses dot notation to access fields in an embedded document:
The query matches the document where the name field contains an embedded document with the fieldfirst with the value "Yukihiro" and a field last with the value "Matsumoto". For instance, the query would match documents with name fields that held either of the following values:
{
  first: "Yukihiro",
  aka: "Matz",
  last: "Matsumoto"
}

{
  last: "Matsumoto",
  first: "Yukihiro"
}

Limit the Number of Documents to Return

The limit() method limits the number of documents in the result set. The following operation returns at most 5 documents in the bios collection:
db.bios.find().limit( 5 )
limit() corresponds to the LIMIT statement in SQL.

Set the Starting Point of the Result Set

The skip() method controls the starting point of the results set. The following operation skips the first 5documents in the bios collection and returns all remaining documents:
db.bios.find().skip( 5 )
Order Documents in the Result SetThe sort() method orders the documents in the result set. The following operation returns documents in the bios collection sorted in ascending order by the name field:
db.bios.find().sort( { name: 1 } )
sort() corresponds to the ORDER BY statement in SQL.
A few times now I’ve mentioned that find returns a cursor whose execution is delayed until needed. However, what you’ve no doubt observed from the shell is that find executes immediately. This is a behavior of the shell only. We can observe the true behavior of cursors by looking at one of the methods we can chain to find. The first that we’ll look at is sort. We specify the fields we want to sort on as a JSON document, using 1 for ascending and -1 for descending.

//heaviest unicorns first
db.unicorns.find().sort({weight: -1})
//by unicorn name then vampire kills:
db.unicorns.find().sort({name: 1, vampires: -1})

As with a relational database, MongoDB can use an index for sorting. We’ll look at indexes in more detail later on. However, you should know that MongoDB limits the size of your sort without an index. That is, if you try to sort a very large result set which can’t use an index, you’ll get an error.

Count

The shell makes it possible to execute a count directly on a collection, such as:
> db.unicorns.find().count()

12
> db.unicorns.count({vampires: {$gt: 50}})
6

In reality, count is actually a cursor method, the shell simply provides a shortcut. Drivers which don’t provide such a shortcut need to be executed like this (which will also work in the shell):
db.unicorns.find({vampires: {$gt: 50}}).count()

Paging

Paging results can be accomplished via the limit and skip cursor methods. To get the second and third heaviest unicorn, we could do:
db.unicorns.find()
.sort({weight: -1})
.limit(2)
.skip(1)
Using limit in conjunction with sort, can be a way to avoid running into problems when sorting on non-indexed fields.

Mongodb explain() Query Analyzer and it's Verbosity

First creating 1 million documents: > for(i=0; i<100; i++) { for(j=0; j<100; j++) {x = []; for(k=0; k<100; k++) { x.push({a:...