Archive for the 'Amazon' Category

Neat ideas: Thrift, Thrudb

Wednesday, January 2nd, 2008

Wow. The pace at which innovation happens on the web is truly breathtaking. Close on the heels of SimpleDB, The Third Rail guys have relased ThruDB, which I can only say is a fantastic idea. Simply put, ThruDB is a set of services built on top of Facebook’s Thrift, its’ cross-language services development framework, to provide highly scalable document-oriented data storage mechanism. In this respect, it is quite similar to SimpleDB. Document-oriented databases become more and more important, as the web increasingly moves towards loosely-structured data that becomes increasingly difficult to slot into a database schema.

There are two interesting parts of ThruDB – Thrift, Facebook’s framework for programs built in various different languages talk to each other. This is truly innovative and useful, as many of us have felt the pangs of envy on seeing some piece of great software written in some language which is not our favorite. Thrift provides a way where you can define datatypes and service interfaces in different languages in a fairly simple definition file. Taking this file as input, Thrift generates code to create RPC clients and servers that can communicate freely across programming language barriers. If you want to see some code, you can do much worse than checking out this nice tutorial by Ilya Grigorik.

While Thrift itself is big enough topic for a post of its’ own, for now, we will concentrate on the bigger picture. So ThruDB is this bunch of interoperable services built on top of Thrift, and uses highly scalable storage mechanism like S3 to actually store the data. You can find more about this in (again) Ilya Grigorik’s excellent analysis here. So what are these so-called ‘interoperable services’?

  • Thrudoc – A document storage service
  • Thrucene – A document indexing service
  • Throxy – A service partitioner
  • Thruqueue  – A persistent message queue

If you’re bewildered, don’t be – a their heart, they’re really simple, but amazing tools. Thrudoc is a simple key value storage system, designed to work on different kinds of storage sytems, includeing regular disks and Amazon’s S3. Thrucene simply exposes Lucene’s search API as a Thrift service. Throxy, helps in partitioning and scaling the other Thrudb services horizontally. The latest service Thruqueue, quite similar to Amazon’s own SQS adds persistent queue capability to this bunch of services. The one difference between Thruqueue and SQS that I see is that while Amazon limits each message body to 256kb, Thruqueue has no such hard limits.

We will be playing with these tools very soon, and compare them to Amazon’s offerings, and finding out the various advantages offered by each service. Stay tuned.

Print

SimpleDB – The next arrow in Amazon’s quiver

Sunday, December 16th, 2007

I’m pretty impressed with Amazon these days – specifically about the web services they are offering. We are very lucky to get an opportunity to work with Amazon Webservices for one of our clients, and I always thought Amazon S3 and EC2 are amazing tools.

Now, they have followed that up with SimpleDB, which seems even more innovative and game-changing. In creating this service, Amazon seems to attempt to turn the traditional database paradigm on its’ head. Apart from being a web service, accessible through an API, there are many things quite special about SimpleDB: ‘Domains’, which are conceptually similar to database tables, can contain ‘Items’ which are similar to rows, and each of those items contain several ‘Attributes’, which are similar to columns.

The attribute values are stored in ‘Cells’, similar to database fields. However, these cells unlike traditional databases, can hold multiple values. You can have as many or as few attributes for each item. That is, each row/item need not contain a fixed number of columns/attributes. Each item to its’ own. Seemingly, they are indexed automatically. Even better, there is no such thing as a database schema. You can add whatever type of information you want in each cell.

Amazon seemingly created a simple query language to create and retrieve data. My first reaction was that it would be better if it created an sql-like language, but the SimpleDB query language seems to be much simpler than SQL – so why complicate our lives? After going through the developer docs, I am a little disappointed that there isn’t a Ruby interface to SimpleDB as yet. I’m pretty sure it won’t take too long before someone comes up with that very soon.

My only doubt would be how much lag is there going to be to retrieve and send data to this web service? At the end of the day, a database run by us on the same machine as the web server or a nearby server is going to be pretty fast to access. Can Amazon SimpleDB match that? Of course, we would only know that once we try it out. We should know very soon.

What is surprising is that Google or Yahoo is not competing in this space. Probably only developers and startups care about this right now, but with some polish, AWS can become enterprise-ready, and Amazon might end up becoming the company that becomes the IT infrastructure supplier for businesses very soon. That is a bit surprising, because Amazon, at the end of the day, is an online retailer. So how does this fit into their business strategy? So they want to sell IT infrastructure along with books, toys and electronics? That is a bit strange, because AWS seems to be the only thing that Amazon sells which is enterprise-oriented.

Finally, I don’t think it is Facebook which is the new “that” company to work for, which has the coolest problems for the developers to be solved and has a great future (I still think it is in many ways still Google) it just might be, surprisingly, good ole Amazon. Right on. It will be well deserved.

Print