Just a few days ago I wrote an article about Amazon Web Services stack
, in which I praised Amazon's vision and ability to deliver elegant, generic web services platform of the future. In the end of the article I mentioned that it will be difficult for Google and Microsoft to catch up. I could still be right, but tonight Google made it clear that they are going to be in this race.
The Google Base API
is like Amazon S3 on steroids. In addition to pure storage capability, this API comes with concept of RSS-based structured data types, ability to automatically index and search the data, as well as storing and publish things via RSS. It is interesting, unexpected move, since the service seems to mash storage and publishing together.
Apples become Oranges?
So how do we go about comparing these services? There are several angles and criteria that might lead us to different conclusions. As a software engineer, I am subconsciously drawn to Amazon's simple and canonical approach. Each service has a very basic, minimalistic API and is focused on accomplishing very specific task. For example, Amazon S3 just stores the data and allows the fetch, but is not concerned with things like RSS. When the entire stack of services is aggregated together, you then get a powerful playground where you can pick and choose what you need to address your specific needs.
On the other hand, at this point everyone acknowledges that RSS has become a basic building block of the web. So you can not help but wonder if it makes sense to have it wired right into your data store. While I am not quite ready to make this leap myself, I can see how a lot of people would. My rule of thumb is that technologies, unfortunately, come and go, so I would not bet everything on RSS as it is right now. But the time, of course, will tell.
Hello and welcome to the world of Google semantics
The basic mechanics of posting and managing objects is similar to Amazon S3. You can read my detailed article
about this service to learn about the rudimentary operations of storing and retrieving items.
Lets zoom in now on some of the exciting new things that come with Google Base. The first feature of note is introduction of attributes and types
. This is very much welcomed, because today's web is not a random collection of words and letters. We talk about friends, books, music, politics, housing – in short, we discuss life, where things naturally have meaning and semantics. Google introduces a attribute/type system with the set of pre-defined attributes and types, which can be augmented by the developers. This is excellent move, since it encourages common sense standard as well as leaves room for flexibility and exceptions.
The system leverages the standard RSS attributes such as title and item, but, because of its XML-based nature does not play with microformats. This is not necessarily bad, since XML-based annotation system is at least as powerful as the microformats languages. In fact, from my point of view, even this system has a few loose ends. For example, a review attribute may contain text to indicated that it is a review of a movie or a book or a restaurant review. This is not going to be sufficient for situations when the actual underlying object needs to be identified exactly. However, since the defined attribute/type system is extensible, these sort of things can be corrected in the future.
Search is still the king
Google is the undisputed master of the search domain. All Google services are leveraging the success of this Google grand daddy. The new Google Base API is no exception. This is one of the features which puts S3 behind at this point. Ability to slice and dice the stored information each and every way is absolutely essential. What Google is doing for you automatically is creating a gigantic set of indicies for all things that you publish, so that anything can be found very, very quickly.
The query language is powerful. It even allows comparison queries for types that are declared as numbers; here is an example of a query:
[item type:products] (ipod | "mp3 player") [price <= 150.0 USD]
Personally, I would have liked this to be more REST-full, but I guess this is shorter and more powerful. For those of you who miss the programming language class, here is the BNF of the grammar
The query results can be paginated much like S3. The difference is that unlike S3, this paging works on indicies instead of prefixes. These differences are due to specifics of Google vs. Amazon's implementation and do not make much difference to the end user.
Like search, this feature is noticeably absent from S3 repertoire. The ability to execute multiple fetches is invaluable, since it enables, for example, generating a web page based on a certain criteria. Specifically, with S3 to get the list of latest items posted by a user, we need to first query the keys and then for each key fetch the item in a separate request. This is unacceptably slow, especially when it comes to generating a web page on demand. So Google definitely did the right thing by having the batch mode built right in.
Similar to S3, there is a concept of privacy, but it is not quite the same. In S3, there is a simple way of marking each item as public or private for both read and write. Google's approach seems to different. First, there is a distinction between an item and a snippet. Here is Google's definition:
: for the general public and provides a slightly shortened description
: a private customer-specific feed for customers to insert, update, delete, and query their own data. This feed requires authentication.
I find this pretty confusing, particularly because of the way privacy is defined, here is the definition:
You can control whether attributes are visible by specifying the XML attribute access="private".
So it sounds like you can not make entire entry private? Also, does this apply to both snippet and item attributes? It is not apparent to me from the provided description.
What about performance?
Thats a good question that needs to be answered soon. The performance benchmarks on these services would be very valuable addition to the feature-by-feature comparison and so we hope to see them in the near future.
So with this cat out of the bag, we can do a few predictions. First, we will soon be seeing Google UI in many Google products, particularly Google reader, that is going to render these extended RSS feeds in the nice way. They will probably look something like bluemarks
that we developed at adaptiveblue
. The big difference is that we had to embed the display information in a form of fairly verbose chunk of HTML. Google will enjoy the luxury of styling these feeds using elegant, client-side stylesheets.
Another likely thing is that Google is going to promote this new format, and will work on other products and services to embrace it. I'd like to hear how this plays with microformats and generic HTML pages, because having more different formats for capturing semantics is not taking us any closer to semantic web.
Finally, we can bet on seeing more of these sort of services, probably from Microsoft, maybe from Yahoo! and definitely from small startups that are going to jump in with innovation and twists. Different approaches and APIs are likely to create a public debate on the topic.
The debate, competition and creativity are great for us, developers. We get to enjoy the fight, but more importantly to jump in and to voice our opinions and concerns. Not only we get to use these technologies, we also get a chance to impact how these technologies evolve. This is very important, and we should not miss the opportunity. I am sure these companies are willing to listen, and are looking for your feedback, so drop them a line.