Saturday, January 2, 2016

Cassandra Distribution - Write & Read Path


In this article we will talk about how Cassandra writes or reads. So we are going to start with the write path.

The Write Path

  •  Writes are written to any node in the cluster (coordinator)
  •  Writes are written to commit log, then to memtable
  •  Every writes include a timestamp
  •  Memtable flushes to disk periodically (sstable)
  •  New memtable is created in memory
  •  Deletes are specific write case, called a "tombstone"

We start with the high level like the cluster level and then zoom in to see what is going on in individual level node. So when you sent a write to a Cassandra cluster, like we discussed in earlier article all nodes in the cluster are equal, so any given node could service the write request like you send in. Whatever node you happen to be talking to for that request is called coordinator node and that's because he will be doing the coordination with rest of the nodes in cluster on behalf of your query. This isn't a special node type (Cassandra doesn't have any special node type who is leader or master), this is just for that given request, it's the one who is going to do the coordination. 

Now, once a write request hits an individual node, there are only two things that kind of happen. The first that happens is that write emitation is written to something called the commit log, and the commit log is an append only data structure, so you can imagine sequential IO, this going to be very fast. Once it is written to commit log, once we got that durability, the next thing Cassandra does is, it merges the reputation into an in memory representation of your table, this is called Memtable. Once it does that it writes to the commit log, merges into the in memory representation of your table, it can then respond back to the co-ordinator to the client and say Yup! We wrote your data. Cassandra is so fast in writing data, also called write optimized database, part of the reason is simplicity of the write path. 

So it is pretty simple write path, the only thing you need to know is every time you do a write to Cassandra every column value you write gets a timestamp. This is gonna be come important in few minutes when we talk about how SSTable works in compaction.

You can imagine data getting written to the memtable, eventually memory gonna run out, (memory being a finite resource), so we have to take those memtable and flush them to disk. Cassandra does this behind the scene Asychroniously, so basically its a whole bunch of sequential IO which is going to be fast and is basically taking the in memory representation in memtable and serializing it to disk, something called SSTable. 

One of the thing to know besides this write have timestamp is also that Cassandra doesn't to any update in place or delete in place. The SSTable and Commit logs are immutable. So what Cassandra actually does when you do a Delete data, is write a special kind of record called "tombstone", which is basically a marker that says there is no data here any more as of this timestamp, because tomstone also gets the timestamp like regular data does, there is no data for that column any more.

What is SSTable

  •  Immutable data file for row storage
  •  Every write includes a timestamp of when it was written 
  •  Partition is spread across multiple SSTables
  •  Same column can be in multiple SSTables
  •  Merged through compaction, only latest timestamp is kept
  •  Deletes are written as tombstone
  •  Easy backup


So once a memtable get full it is written to disk as in SSTable, also these are imutable. So what happen when we got to many SSTable written to disk, how does that function.

SSTable are immutable files on disk. As we write SSTable to disk, we gonna have bunch of really small ones, and as an optimization we have a process called Compaction. Compaction takes small SSTable and merges them into bigger one. If we have a row1 at time 1 and a row2 at time 2, when we merge those table together, we are going to take the row that written to time 2, based on the timestamp which is included with the data and we are going to keep that and disregard the old one. So this is what keep Cassandra working very fast and this is how we avoid checking many SSTable to find the same piece of data.

One of thing great about it is it makes backup extremely trivial. Whenever an SSTable is written to disk, you just copy it all to disk and fine tune another server and you are good to go.

The Read Path

  •  Any Server may be queries, it acts as the coordinator
  •  Contact nodes with the request key
  •  On each node data is pulled from SSTable and Merged
  •  Consistency < ALL performs read repair in background (read_repair_chance)

Read works with the same tactic. From a cluster level then to an individual node level after that. Reads in Cassandra are very similar to write. Whichever node you are gonna happen to be talking to, for a given read request is going to be the co-ordinator node. Again it's not like a special node, it's just for that request, that one node is going to be the one that coordinate with the rest of the nodes in the cluster.
Now when we talk about individual node level but actually goes on, as Cassandra is gonna go to disk and it is going to look for whatever data it is, you asked for and it might have to look into multiple access table. So as we discussed compaction is running as sort of background process to merge these access table together but you might have some data some rows that across multiple access tables, where compaction hadn't had the chance to run and combine them together yet. So once it pull the data out of what could be multiple access tables, it will put that up in memory, merge them together using the timestamp which we talked about where the last write wins so the latest timestamp always wins. And if there is any unflushes data in the memtable like we talked about that got merged in as well and then we can actually send a response back to the client or to the coordinator and say hey here is the data you asked for.

Now you can imagine, because we might be reading from multiple access tables on disk, if you have a read heavy workload on Cassandra, the choice of disk (like whether you choose as steel or old spinning rust type drive ), what type of drive you choose to run Cassandra on if you have a  right heavy workload will have a serious impact on what kind of performance you get out of Cassandra. 

Another thing to keep in mind is that Compaction also gonna have an impact. You can imagine if compaction is running and there are a fewer files Cassandra have to search for a given key on disk then it's gonna be much faster as it has to do much less disk IO, which is always a good thing when we are trying to read data.

When you do a read with a consistency level something less than ALL, Cassandra is going to do something in the background and have a chance to do something in the background that's called a read repair. So we talked about Cassandra is eventually consistent system and so from time to time node may disagree actually about the value of a given piece of data, one node might not have the latest updated piece of data and so there is a configuration when you create a table in Cassandra called read_repair_chance and this is basically the chance whenever you do a read against Cassandra, that it actually going to go and talk to other replica in the cluster and make sure that everybody has the most up to date insync information. 

6 comments:

  1. Analyze Cassandra's Performance Issue by means of Cassandra Technical Support

    On the off chance that you found any moderate or drowsy issue in your Cassandra Database at that point much of the time contact to Cassandra Database Consulting and Support or Cassandra Database Support. With our help and help you can undoubtedly see any downtime, log jam, specialized hiccups et cetera. At Cognegic we give complete establishment and design of your Cassandra Database, execution tuning, improve the execution of your whole Cassandra Database.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  2. Effectively Optimize the Performance and Availability of your Cassandra DB through Cassandra Technical Support
    The Cassandra database gives versatility to enormous information stockpiling and preparing. With Cognegic's Cassandra Database Consulting you will get profound deceivability into your Cassandra condition, track the execution and give ready instrument if any issue has happen. You will get 24*7 Apache Cassandra Support or Cassandra Customer Service through our accomplished specialists.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  3. Get Quick Recovery of Data after Accidental Delete with Cassandra Technical Support
    Rapidly recoup everything erased information or data of Cassandra Database with Cognegic's Cassandra Database Support or Cassandra Database Consulting and Support. Our specialists have energy and ability to recuperate your everything information identified with Cassandra and separated from that we likewise recoup and give reinforcement to whole database including MongoDB, MySQL, Oracle, MS SQL server et cetera. You can whenever contact to our Apache Cassandra Support and get the propel support.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  4. Optimize Cassandra Performance tuning with Cognegic's Cassandra Technical Support
    In the event that you consistently getting the awful execution of your Cassandra Database then it imply something turning out badly in your Cassandra database. Anyway you need to enhance the execution of your database with Cassandra Database Support or Cassandra Database Consulting and Support. For enhancement we utilize a few systems and propel devices to understand your issues and raise you hell free. We have gifted specialized specialists who give you best help.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  5. The most effective method to Solve Cassandra Error "No Host Available Error Frequently" with Cassandra Technical Support
    Well! In the event that you are endeavoring to interface Cassandra utilizing NodeJS through express-cassandra node_module and once you begin your application then it gives you some specialized blunders. To determine this issue a large portion of the general population attempted to change rpc_address, listen_address with the assistance of different connections however nothing happened. On the off chance that still you are confronting a similar issue at that point contact to Apache Cassandra Support or Cassandra Customer Service to determine this issue.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  6. Not able to Transfer MySQL Data to Cassandra Database? Contact to Cassandra Technical Support.
    On the off chance that you are utilizing MySQL Server underway side and that is at present having 200GB then unquestionably it turns out to be extremely awkward to oversee MySQL server since it is developing exponentially. To comprehend these sorts of issue, Cassandra database appeared which has capacity to store substantial measure of information without making any issue. In any case, the inquiry how you can exchange all your MySQL Data to Cassandra database? Well! Try not to freeze since we at Cognegic give Apache Cassandra Support or Cassandra Customer Service to explain these sorts of issues.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete

Mongodb explain() Query Analyzer and it's Verbosity

First creating 1 million documents: > for(i=0; i<100; i++) { for(j=0; j<100; j++) {x = []; for(k=0; k<100; k++) { x.push({a:...