DisCopy


Saturday 16 July 2011

What is Cassandra?

Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.


Cassandra provides a structured key-value store with eventual consistency. Keys map to multiple values, which are grouped into column families. The column families are fixed when a Cassandra database is created, but columns can be added to a family at any time. Furthermore, columns are added only to specified keys, so different keys can have different numbers of columns in any given family. The values from a column family for each key are stored together, making Cassandra a hybrid between a column-oriented DBMS and a row-oriented store[citation needed].


Visit at: http://cassandra.apache.org/


More Info:

The largest production cluster on Cassandra has over 100 TB of data in over 150 machines. 

Apart from the 500 million users, the stats for FB are
·      100 billion hits per day
·      50 billion photos
·      2 trillion objects cached, with hundreds of millions of requests per second
·      130TB of logs every day

May be I just crossed the line from mild fascination to full-blown obsession, but you would know what I mean if you are a fan of Facebook Engineering Tech Talks.

With YouTube, Google into MySQL and Twitter, Reddit and Digg all entangled in to Cassandra, All that I can say is, It is not Sybase Vs Oracle Vs MS any more…

No comments:

Post a Comment