Jul 14

Tim O'Reilly

Tim O'Reilly

Google Gears and Version Control

I posted yesterday about the importance of version control. There was another bit I wanted to work into that post, but ultimately decided against. In our backchannel discussion, Nat had pointed to a fascinating post last month by Dare Obasanjo about sync being left out of Google Gears:

I don't consider myself some sort of expert on data synchronization protocols but it seems to me that there is a lot more to figuring out a data synchronization strategy than whether it should be done based on user action or automatically in the background without user intervention. It seems that there would be all sorts of decisions around consistency models and single vs. multi-master designs that developers would have to make as well. And that's just for a fairly straightforward application like Google Reader. Can you imagine what it would be like to use Google Gears to replicate the functionality of Outlook in the offline mode of Gmail or to make Google Docs & Spreadsheets behave properly when presented with conflicting versions of a document or spreadsheet because the user updated it from the Web and in offline mode?

It seems that without providing data synchronization out of the box, Google Gears leaves the most difficult and cumbersome aspect of building a disconnected Web app up to application developers.

Karl Fogel replied:

Yes, this has relevance to distributed/decentralized version control systems -- though VC systems tend to both have a narrower range of synchronization-requiring scenarios, and to be more conservative about the promises they make to the user regarding the degree to which synchronization can be automated. But it's still basically the same problem: objects that start out as identical copies, then slowly drift apart in isolation, and then need to be reconciled.

"Data synchronization is hard; let's go shopping!"

Unfortunately, going shopping is not an option. I like Linus' thinking (referred to yesterday) about the importance of distributed version control with branching and merging. Solving this problem seems very important to me -- and solving it in a way that provides this kind of synchronization as a service -- what Dare was so disappointed to find missing in Google Gears.

This is also relevant to the discussion of open source and web 2.0. A new open source version control package -- whether subversion or git -- is still targeted at the old model of producing binary software. Can you imagine version control as a service? Now that would be cool.

tags:   | comments: 7   | Sphere It

Previous  |  Next

0 TrackBacks

TrackBack URL for this entry:

Comments: 7

  Chris [07.14.07 08:46 AM]

Bring on collaborative music/sound editing, me making a song, and my friends mastering/mixing it for me. This could definitely benefit from Version Control as a Service!

  Greg Wilson [07.14.07 09:34 AM]

One of the problems this faces is that most existing version control systems divide the world into text (which can be diff'd and merged) and non-text (which can't), with *ML-style markup as an interesting midpoint (it's text, but traditional diff/merge isn't very useful). Content is clearly moving to the right in this scheme---CAD diagrams, images, audio/video, etc.---so whoever addresses the general diff/merge problem first will be very well positioned.

See also

  N.Cauldwell [07.14.07 09:38 AM]

Chris - collaborative music making is already out there. Check out and I don't think they have quite the level of version control that Tim is asking for, but it's a start.

  Thomas Lord [07.14.07 02:03 PM]

Greg wrote: so whoever addresses the general diff/merge problem first will be very well positioned.

You might think so, but...


  Karl Fogel [07.14.07 08:42 PM]

The solutions can only be automated up to a point — eventually humans have to get involved. Data divergence is kind of like speciation: after a certain point, the lines just don't mix anymore, despite having a common ancestor. Someone has to step in and resolve the divergence by actually knowing what the data means.

So I think the trick is not to look for complete solutions, but to concentrate on tools that make the human's job easier. In line-formatted text files, traditional (diff3-style) conflict markers are a good example. Greg's right: we need the equivalent of that for many other formats (personally, I could use one for OO Impress / MS PowerPoint presentations right now).

Continuing in the "this is a blog comment so irresponsible hand-waving is the name of the game" vein:

There's a bright future, IMHO, for "post-facto version control". That is, tools that do with digital formats what molecular biologists do with genetic sequences (which are also a digital format, in a way): analyze sets of related sequences and figure out their ancestor-descendant-sibling relationships, by looking at the frequencies, lengths, and positions of shared bit runs.

  Michael R. Bernstein [07.15.07 02:09 PM]

The synchronization functionality in Google Docs and Spreadsheets is already available as a library for Python, Java, and Javascript under the LGPL:

I would not be surprised if this was eventually turned into a Google service of some kind, and/or rolled into some future version of Google Gears.

  Nick Gerner [07.18.07 12:18 PM]

Linus does make some good points about distributing VCS. This is another example of the tipping point software is quickly approaching. We're headed for a highly distributed world of computation (P2P, clusters, grids, many-core) and data access/storage (Werner Vogels with Amazon's S3 and Dynamo).

With Google Gears you have to buy into this. Distributed data management isn't going to be solved by Gears. Perhaps it's time developers and software architects start thinking along these lines. Or better yet, perhaps it's time educators start teaching along these lines.

Post A Comment:

 (please be patient, comments may take awhile to post)

Type the characters you see in the picture above.