Neat ideas: Thrift, Thrudb

02 Jan 2008

Wow. The pace at which innovation happens on the web is truly breathtaking. Close on the heels of SimpleDB, The Third Rail guys have relased ThruDB, which I can only say is a fantastic idea. Simply put, ThruDB is a set of services built on top of Facebook's Thrift, its' cross-language services development framework, to provide highly scalable document-oriented data storage mechanism. In this respect, it is quite similar to SimpleDB. Document-oriented databases become more and more important, as the web increasingly moves towards loosely-structured data that becomes increasingly difficult to slot into a database schema.

There are two interesting parts of ThruDB - Thrift, Facebook's framework for programs built in various different languages talk to each other. This is truly innovative and useful, as many of us have felt the pangs of envy on seeing some piece of great software written in some language which is not our favorite. Thrift provides a way where you can define datatypes and service interfaces in different languages in a fairly simple definition file. Taking this file as input, Thrift generates code to create RPC clients and servers that can communicate freely across programming language barriers. If you want to see some code, you can do much worse than checking out this nice tutorial by Ilya Grigorik.

While Thrift itself is big enough topic for a post of its' own, for now, we will concentrate on the bigger picture. So ThruDB is this bunch of interoperable services built on top of Thrift, and uses highly scalable storage mechanism like S3 to actually store the data. You can find more about this in (again) Ilya Grigorik's excellent analysis here. So what are these so-called 'interoperable services'?

  • Thrudoc - A document storage service
  • Thrucene - A document indexing service
  • Throxy - A service partitioner
  • Thruqueue  - A persistent message queue
If you're bewildered, don't be - a their heart, they're really simple, but amazing tools. Thrudoc is a simple key value storage system, designed to work on different kinds of storage sytems, includeing regular disks and Amazon's S3. Thrucene simply exposes Lucene's search API as a Thrift service. Throxy, helps in partitioning and scaling the other Thrudb services horizontally. The latest service Thruqueue, quite similar to Amazon's own SQS adds persistent queue capability to this bunch of services. The one difference between Thruqueue and SQS that I see is that while Amazon limits each message body to 256kb, Thruqueue has no such hard limits.

We will be playing with these tools very soon, and compare them to Amazon's offerings, and finding out the various advantages offered by each service. Stay tuned.