Tuesday, January 11, 2011

Just say noSQL

I was trying to catch up on my 1000+ news items in my Google Reader (successfully I might add, though I cheated quite a bit), when I stumbled on this great article: 7 Exciting Web Development Trends for 2011. The entire article is a good read and source of further information, but item number one really intrigued me, noSQL. The term noSQL is referring to any non-relational database and an overall shift in database management instead of one particular technology. What really caught my interest was this line from the above article:
Another big factor for me is the simplicity it brings to my schema: your data models can now become a lot more sane. I can’t wait to not muck around with my models just to make it fit into the relational model.
I have often over complicated very simple database models to meet a general level of normalcy and reap the benefits it offers, regardless of how often I took advantage of them. This solution sounded great, so I dug further.

A quick google and we find what appears to be the Ultimate Guide to the Non-Relational Universe right after the wikipedia entry. But this page gives me a headache, so I looked further into the examples listed under Taxonomy on the wikipedia page. I chose Google's BigTable under "key/value store on disk" and Apache's Cassandra under "eventually-consistent-key/value store". A few key points I gathered:

  • BigTable -  "it departs from the typical convention of a fixed number of columns, instead described by the authors as "a sparse, distributed multi-dimensional sorted map", sharing characteristics of both row-oriented and column-oriented databases." BigTable is designed to scale into the petabyte range across "hundreds or thousands of machines, and to make it easy to add more machines [to] the system and automatically start taking advantage of those resources without any reconfiguration".
  • Cassandra - "The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model."

This guy doesn't like noSQL, but he writes a good article! I gathered from his article that using a noSQL approach will not have any benefit and will only add a learning curve unless you are planning to scale to Google size soon. I can fully see the validity of his point, however, I am intrigued by the players in this movement and always looking to learn the latest and greatest. Even if it doesn't take over the world, it will probably give some insight into the next big thing.

And since Facebook open-sourced Cassandra. (Yay Facebook!!) and Google provides a BigTable interface on their App Engine, I ended up focusing on these two options so that I could experiment with them. Let the fun begin!

No comments:

Post a Comment