Friday, May 6, 2011

Disruptive Cloud Start-Ups - Part 1: NimbusDB

Being at Under The Radar (UTR), watching disruptive companies present and network with entrepreneurs, thought leaders, and venture capitalists is an annual tradition that I don't miss. I have blogged about disruptive start-ups that I saw in the previous years. The biggest exit out of UTR, that I have witnessed so far, is Salesforce.com's $212 million acquisition of Heroku. This post is about one of the disruptive start-ups that I saw at UTR this year - NimbusDB.

I met with Barry Morris, the CEO and Founder of NimbusDB at a reception the night before. I had long conversation with him around the issues with legacy databases, NoSQL, and of course NimbusDB. I must say that, after long time, I have seen a company applying all the right design principles to solve a chronic problem - how can you make SQL databases scale so that they don't suck.

One of the main issues with the legacy relational databases is that they were never designed to scale out to begin with. A range of NoSQL solutions addressed the scale-out issue, but the biggest problem with a NoSQL is that NoSQL is not SQL. This is why I was excited when I saw what NimbusDB has to offer: it's a SQL database at the surface but has radically modern architecture underneath that leverages MapReduce to divide and conquer queries, BitTorrent for messaging, and Dynamo for persistence.

NimbusDB's architecture isolates transactions from storage and uses asynchronous messaging across nodes - a non-blocking atom commit protocol - to gain horizontal scalability. At the application layer, it supports the "most" of SQL 99 features and doesn't require the developers to re-learn or re-code. The architecture doesn't involve any kind of sharding and the nodes can scale on any commodity machine on a variety of operating systems. This eliminates an explicit need of a separate hot back-up since any and all nodes serve as a live database in any zone. This makes NimbusDB an always live system, which also solves a major problem with traditional relational databases - high availability. It's an insert only database and it versions every single atom/record. That's how it achieves MVCC as well. The data is compressed on a disk and is accessed from an in-memory node.

I asked Barry about using NimbusDB as an analytic database and he said that the database is currently not optimized for analytic queries, but he does not see why it can't be tuned and configured as an analytic database since the inherent architecture doesn't really have to change. Though, during his pitch, he did mention that NimbusDB may have challenges with heavy reads and heavy writes. I personally believe that solving a problem of analytic query on large volume of data is a much bigger challenge in the cloud due to the inherent distributed nature of the cloud. Similarly, building a heavy-insert system is equally difficult. However, most systems fit somewhere in between. This could be a great target market for NimbusDB.

I haven't played around with the database, but I do intend to do so. On a cursory look, it seems to defy the CAP theorem. Barry seems to disagree with me. The founders of NimbusDB have great backgrounds. Barry was the CEO of IONA and Streambase and has extensive experience in building and leading technology companies. If NimbusDB can execute based on the principles it is designed on, this will be a huge breakthrough.

As a general trend, I see a clear transition, where people finally agree that SQL is still a preferred interface, but the key is to rethink the underlying architecture.

Update: After I published the post, Benjamin Block raised concerns around NimbusDB not getting the CAP theorem. As I mentioned in the post, I also had the same concern, but I would give them benefit of doubt for now and watch the feedback as the product goes into beta.

Check out their slides and the presentation:

Slides:



Presentation:







No comments: