Sat

Aug 4
2007

Tim O'Reilly

Tim O'Reilly

Scratching itches in the cloud

Sriram Krishnan has a great blog post entitled Open source and scratching itches in the cloud that takes off from my recent post about hadoop. Sriram points out the difficulties of "scratching your own itch" with server-backed applications, for three reasons (with my comments in parentheses):

  1. The inaccessibility of the back-end code, and the fact that that code is shared by many simultaneous users. (You can submit a bug report or feature request for either proprietary or open source code, and your success rate in getting your problem fixed depends more on the responsiveness of the development organization than on whether the code is open or closed source. What you get with open source is a second level of recourse: you can patch your own local version of the program, and if you feel strongly enough about the importance of your patches, you can fork the entire source tree and see if other people adopt your version. But when the user no longer has the choice of running a separate copy, due to the scale of the application and its data (see points 2 and 3), this recourse is gone, whether or not the code is open source.)

  2. The fact that the value of the application relies on shared and collective data. (This has been my fundamental point about Web 2.0 from the beginning: that it represents an entirely new paradigm for potential lock-in. There's a kind of natural lock-in at work here, which I've elsewhere referred to as the ebay effect. I don't think we can fix that. If the biggest database is the best, it will continue to get bigger and better. But that doesn't mean that the owner of that database shouldn't be opening it up so that users can own their own data in that database. Ownership includes transparency (what data do you have about me?), my ability to correct it (think identity theft and credit scores -- but you need not go that far. For example, if Amazon has incorrect metadata about an O'Reilly book, you'd think that Amazon would trust us, as the publisher, to fix it.), and the ability to easily extract that data for archival purposes, re-usability, or mobility elsewhere. (Imagine, for instance, being able to move your "friends list" from LinkedIn to Facebook.) I agree with Dare Obasanjo when he says just getting your own data out doesn't really solve the problem, but I wouldn't go as far as he does to say that vendors owe us the ability to export their entire database! Just because we might like it doesn't mean we deserve to get it.)

  3. The fact that the application relies on a very large hardware platform. As Sriram points out, this issue may be going away with new commodity computing services like S3 and EC2 (what some people have started to refer to as "hardware as a service.") (But quite frankly, if we solve the open data problem, I think we get more "freedom" and benefit for users than solving either the hardware or software lock-in problem.)


tags: open source, web 2.0  | comments: 7   | Sphere It
submit:

 
Previous  |  Next

0 TrackBacks

TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/5734

Comments: 7

  Michael R. Bernstein [08.04.07 05:54 PM]

Aggregating data per-se is actually relatively cheap, as long as you don't do anything interesting with it. The curation function itself can be well within the scope of a volunteer organization. We can already see various projects and non-profit organizations that are doing this in various areas and then providing that data for others to use (some under reciprocal terms, some permissive). Two examples that have been around for a while are MusicBrainz and FreeDB.

So far as your example of moving your from LinkedIn to Facebook is concerned, assuming you actually want to continue to use both services you now potentially have a synchronization problem. This could potentially be solved by having some independent 3rd party (possibly a non-profit) act as both a clearinghouse and as the master copy. Note that this could lower the barrier to entry for a new entrant to the social networking space.

As an aside, one important public policy area is the frequent habit of for-profit companies taking public data from the government, locking it up as a proprietary service, and subsequently restricting (or get the government to restrict) access to the original data. Aside from more discussion of 'Free Data' (or 'Open Data'), we need more discussion of 'Public Data' as a government function, as in many ways this is the foundation of 'government 2.0'.

  Michael R. Bernstein [08.04.07 06:00 PM]

Aggregating data per-se is actually relatively cheap, as long as you don't do anything interesting with it. The curation function itself can be well within the scope of a volunteer organization. We can already see various projects and non-profit organizations that are doing this in various areas and then providing that data for others to use (some under reciprocal terms, some permissive). Two examples that have been around for a while are MusicBrainz and FreeDB.

So far as your example of moving your 'friends list' from LinkedIn to Facebook is concerned, assuming you actually want to continue to use both services you now potentially have a synchronization problem. This could potentially be solved by having some independent 3rd party (possibly a non-profit) act as both a clearinghouse and as the master copy. Spinning off the raw 'friend' data to a 3rd party could be a clever competitive move for whoever is #2 or #3 in the market in order to get the 'biggest database', even if it isn't proprietary, and even when it lowers the barrier to entry for new entrants to the social networking space.

Tangentially, one important public policy issue is the frequent habit of for-profit companies taking public data from the government, locking it up as a proprietary service, and subsequently restricting (or get the government to restrict) access to the original data. Aside from more discussion of 'Free Data' (or 'Open Data'), we need more discussion of 'Public Data' as a government function, as in many ways this is going to be the foundation of 'government 2.0'.

  Michael R. Bernstein [08.04.07 06:14 PM]

Argh. Sorry about the dupe. It didn't seem to go through the first time so I took the opportunity to edit some more and fix a few typos. Can you nuke the first version (and this post)?

  Chris Wong [08.06.07 12:48 AM]

Third party system like OpenID can resolve a part of the migration system. The third party system can store up some information of a person let's say friend list for example, and users are allowed to migrate and update the list to another web application. However, implementing this system is not easy considering the amount of data involved. I have 350+ friends on my facebook and imagine how much data would be accounted for 100k people. Another thing to concern is possibility of synchronizing 2+ platforms at the sametime. Take msn and yahoo messenger as example. Msn live messenging users can now communicate with yahoo messenger users, and I believe the same thing should be applied to other communication mediums such as forums, social networking sites etc. The user should have a choice on which platform to use and can communicate with other users with other platforms, without sacrificing the preferences of using the platforms of one's choice.

  Jean-Francois Noel [08.06.07 06:49 AM]

I think a system like the one proposed by freebase.com and AttentionTrust.org are attacking the Data lock-in problems. AttentionTrust came to my attention by root.net, does anyone knows what happened with them. I’m really disappointed by their apparent disappearance. I once had a really interesting exchange with r0ml (he was working for root) about root system and what they tried to do. Anyway if Data lock-in interests you check out AttentionTrust and Freebase, they may hold some of the keys.

  Ajeet Khurana [08.06.07 01:32 PM]

In this entire discussion I am reminded of the fact that most of the "large" organizations that deployed Linux as their choice of, primarily because of its open-ness, actually paid external vendors to instal and customize it. We must remember that open-source in itself is not a goal and that we are trying achieve something bigger here. That is where the problem arises. So, I guess I find myself agreeing with Michael when he says that "Aggregating data per-se is actually relatively cheap, as long as you don't do anything interesting with it." Though I must confess that I am unable to decide whether that is a profound statement or an amusing one :)

  Michael R. Bernstein [08.06.07 05:27 PM]

Well, I was mostly aiming for 'insightful', rather than profound. My point was merely that a 3rd party aggregator of data that only does curation and leaves doing anything relatively CPU-intensive (such as calculating transitive relationship paths) to others potentially solves a few problems:

1. It gives more control back to the users, individually and in aggregate.

2. It lowers the cost of entry for new competitors who can focus on the 'something interesting'.

3. It can nullify the accelerating advantage of the 'largest database' by supplanting it.

Of course, trusted 3rd party systems have their own failure modes, especially if there is only one. There are well-known solutions to this too: you can have multiple T3Ps who compete but have data-peering arrangements, or the folks doing 'something interesting' can easily federate data from multiple T3Ps.

There are also potentially interesting solutions that use ideas from digital-cash and translucent databases that can enable interoperability in the face of imperfect trust between all parties. A lot of the interesting patents in this area are expiring.

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

RECOMMENDED FOR YOU

RECENT COMMENTS