The Library of the Commons: Rise of the Infodex

universal.jpgSomewhere between the realm of Personal and Shared media lies the realm of the Universal.

The realm of the universal is the Library of the Commons, a global repository of user-generated and crowd-sourced media and information.

Services that logically nest in the Library include: Amazon, Yelp, YouTube, Craigslist, Wikipedia, Flickr, Twitter tweets, items, Scribd docs, Expedia, Google News, Google Maps, TripAdvisor, iTunes, the App Store and any other services and/or information sources that ‘just work.’

In other words, these are services that have defined the ‘IT’ to the point that we can now pretty much take their utility and availability for granted (typically via API access and/or embed codes with some form of customization wizard).

The Genesis of a Library

Library.jpgSo how did we get to this place in the story? What gave birth to the Library of the Commons?

No one formally deigned it so, but from the countless me-too services borne of the dotcom and Web 2.0 land rushes, the above-referred services are the ones that cultivated the biggest audiences, grew the richest ecosystems and inspired the deepest engagement levels.

In Darwinian terms, these are the survivors, whose structures and workflows have been defined and refined by time/experience.

As such, they are generally well thought out, holistic and integrated, but more to the point, have large, engaged user bases.

Thus, the Commons presents a riddle. Almost as if inspired by Herman Hesse’s ‘The Glass Bead Game‘, the riddle is this.

If all of these services yield a smorgasbord of best practices, why not systematically emulate them so as to…FEDERATE them?

Put another way, what if a time came when people ceased trying to perennially re-create the wheel, and instead, started to ‘decompose’ these services; to empty their function sets from whatever nesting they were contained within; and to re-apply them into new contexts supported by a now federated data flow proxied within the Cloud.

Couldn’t the composite feature set be exposed switchboard-style to enable any number of custom services and client apps?

To put some meat on the conceptual skeleton, consider the following exercise that I recently did:

Craigslist-TripAdvisor.jpgA decomposition of Craigslist and TripAdvisor yields deep profiles that are accessorized and interconnected via context traversal flows, such as categorization routines, places, events, airfares, posts, pages, ratings, discussion threads, offers, jobs, businesses, products and personal listings.

Craigslist offers up 36 different sub-types of items For Sale; Services represent another 19 sub-types; Jobs 41 more; Discussions, another 72. And so it goes (including Housing, Personals and Community) across 175+ geo-locales.

TripAdvisor is an instance of this model that overlays a set of time-tested workflows specific to the relatively complex task of planning a vacation.

These workflows make it easy to match a travel plan to specific tastes, requirements and budget – regardless of the information traversal path you pursued to being ready to get pricing on desired travel dates.

Could these same workflows be re-purposed for researching and then purchasing other similarly complex products or services?

I will come back to that thought, in a moment.

The Rise of the Infodex

Infodex.jpgWhat is de-composed, can be re-assembled, and thus begins the Infodex.

The Infodex is a kind of next-generation Rolodex, with aspirations to grow into a real-time marketplace.

What exactly is the Infodex? It is comprised of three parts.

Part one is a listing tool for linking to content, creating a metadata wrapper around media items and encapsulating the above-referenced services (i.e., Yelp, YouTube, WIkipedia) into listing containers that define and expose the methods that one can interface to the media item (framework integrity stuff).

Part two is an indexing engine so that, once simple rules are defined, your media libraries and the information in the listings themselves becomes ‘self-organizing.’

Named picture types (globes, animals, historic or famous images), for example, could be a federation of multiple picture services (Flickr, Photobucket, Getty Images) and ‘discovered’ pictures from past queries.

Looked at from this perspective, the goal, in part, is to establish a cloud-based, crowd-sourced Dewey Decimal System built around the outcome of facilitating better searching, compositing, cross-indexing, sharing, archiving, and analytics functions for specific media and information ‘types.’

Part three of the Infodex is a unified runtime player that is congruent with the information flows of the mobile broadband age; namely, iPhone, Twitter, Facebook and Web (Javascript/Flash embeds/Adobe AIR) based viewing/playback environments.

One simple example of a basic type of function that might be propagated across all of these environments is the Three Item Topical List (e.g., Top Three Favorites or Three Most Related Items). Define once, propagate everywhere.

A core assumption of the model is that both the media player and the service integration layers are open-sourced. This ensures that the user experience is uniformly good across all of these services, and pushes proprietary-ness higher up the stack, thus raising the floor for all comers.

A final thought. Google became Google by indexing the web. Couldn’t the next generation extend this approach by being federated, crowd-sourced and context-specific (i.e., media, information and service aware)?

Are their obvious best practices for The Commons? Obvious gotchas? What about the Infodex?

Related Posts:

  1. Pattern Recognition: Makers, Marketplaces and the Library of the Commons
  2. Envisioning the Social Map-lication
  3. The Mobile Broadband Era: It’s About Messages, Mobility and The Cloud

tags: , , ,
  • jwhiteinfo

    Mark, truly excellent article and fresh thinking – just recommended this via Twitter.

    In terms of ‘gotchas’, the obvious ones would probably be in the areas of rights management and probably monetizing the thing, but there are intriguing possibilities here as well. Keep up the good work! Looking forward to hearing more of your ideas and thoughts.

  • Mark,

    Just as you decompose those other services and then reassemble the parts (conceptually, at this stage) to build the infodex, we can decompose the infodex and thereby learn how to build it.

    The infodex is in some sense a collection of collaboratively maintained, semi-structured documents. The “atomic unit” is something like a fragment of a document – a single piece of content and/or meta-data either authored by a contributor or extracted from one of the legacy services. Those atomic units are then assembled multiple ways into complete documents. Contributions from other services and from human authors are combined into complete documents in multiple ways depending on the context (e.g., who is looking at it and what information are they interested in and allowed to see).

    You’ll want a federated communications backbone that propagates document fragments on a kind of P2P network using a semantics of version-controlled, distributed, decentralized collaborative editing.

    Of course, some nodes (in the middle and at the terminals) in that network will offer general purpose computational capabilities and so “document” in this case includes not only static content but applications – dynamic content.

    One example of an active node is an index builder that affords efficient search and discovery within a corpus of documents.

    It turns out that such a system an be built using known techniques. Roughly speaking, you will need a few Australians, a couple of years, and a pretty good chunk of change.

    In other words: Google Wave.

    Now, to be clear, Google Wave is not the infodex. However, the infodex is an application that can (and probably should) be developed atop Wave. Wave gives you a document ontology and edit/update protocol suitable for infodex. It gives you the federated, P2P protocol for spreading the infodex. In that context, the infodex comes out as a particular set of user interfaces, particular definitions of meta-data extensions, and particular ways of indexing a corpus of infodex documents.


  • bowerbird

    um, is this supposed to be a blueprint?
    or is it just supposed to stir up thought?
    or what?

    because it all sounds very grand, but
    it’s tremendously mushy, and therefore
    not very good at serving as a model…

    a dozen different people could take this
    a dozen different ways, depending upon
    what their specific interpretations were.

    but only if they had a specific interpretation…

    without one, they’d just sit there and squirm,
    and go nowhere, as far as i can tell…

    i’m also wondering how you can imagine
    a “library of the commons”, defined as
    “a global repository of user-generated and
    crowd-sourced media and information”,
    and _not_ include project gutenberg in it.

    but maybe that’s just my specific interpretation.


  • Bowerbird,

    I think “mushy” is a correct description but it’s not without value. By analogy, you could sketch a simple transistor circuit and describe a wide class of amplifiers in a mushy but useful way. It’s a closer analogy than you might think, at first, since it’s handy to model a lot of “web stuff” as “signals and systems” circuits, albeit for essentially discrete signals. Yes, 10 people can go 20 different directions with just the abstract schematic but that 10 people might *want* to go 20 different directions with it makes it a pretty interesting abstract.

    C.f. “gradual stiffening”


  • @jwhiteinfo, The rights management and monetization are certainly potential issues, and as boxee has discovered (with hulu), just because a vendor allows their content to be streamed or embedded doesn’t necessarily equate with being okay with a third party embedding the content into proprietary apps. That’s said, a service like infodex neither disrupts a macro industry (e.g., boxee threat to hulu’s partners big media cos is killing cable industry, which is how media co makes its money) nor hides the source of content, etc. In other words, these are services that do all they can to maximize distribution/reach of the underlying content, making them similar to a good search return page.

    @Thomas Lord, check out my Radar post ‘Messages, Mobility and The Cloud’ as I wholeheartedly agree that messaging is the core of such a model, and that Wave is a logical candidate for it. Also, I agree with the premise that the Infodex model itself lends itself to derivatives so why not create open APIs and facilitate a bigger ecosystem around it, as Twitter has. Think how much Wikipedia might have evolved if they created a twitter like ecosystem around the content. Here is link to the Messages post:

    @Bowerbird, I have presented the underpinnings of a model that federates the biggest libraries of content on the web, anticipates a workflow and decomposition model for embracing the best practices of the services hosting that content and straw man’d a set of services that would logically be built around same. Perhaps to you these are all or none types of things (i.e., upload the PRD/MRD) but to me, the whole point of the exercise is to build up, tear down, iterate and build back up. As to the reference to Project Gutenberg, I have a much longer list of “commons” sources but figured 2-3 examples illustrates the point.

  • bowerbird

    it’s still mush to me. but if someone else digs it, that’s fine. :+)


  • Mark,

    I had a look at the earlier post and I agree that Wave is (likely to be) very important. If not this Wave then the next similar one behind it – but why not this one.

    A criticism or just complementary explication…

    Wave is not just about real-time communication and messages. That’s half of it. It is kind of a “dualistic” thing – a yin yang of two aspects of a single thing – between messaging and collaborative documents.

    What is a document as contrasted with a message? In the Wave view, a document is the aggregated result of a bunch of messages, each of which describes an edit operation on the document. Very roughly (but still usefully) speaking, messages take on a role rather like “patches” in the GNU Arch revision control system or Darcs. A “wave” is a revision controlled document that can be replicated in multiple places. Anywhere there is a copy of the page messages can be received to update that copy. So you have (multi-media, perhaps active) documents shared on a P2P network, with messaging being the exchange of “deltas” – changes to those documents.

    Some of us had been talking about how to build Wave-like things for years but struggling to find support for doing so. Google went and punted that ball down the field by quite a bit.

    I don’t think it is so far fetched that in a decade or so – Waves will replace web sites as “destinations”, as commercial services, and so forth. It intersects very nicely and appropriately with commodity computing (“cloud”). A publisher offers up a wave define content and/or service. It can be replicated throughout a commodity computing network, with copies kept near wherever it is needed. There is still plenty of work to do on UIs, on the richness of the data model, on the implementation of the comms network, on naming of things divorced from DNS — but Wave lays down the basic foundation from a credible source.

    Another way to say it is that it is under appreciated that Wave is a big step towards laying down an overlay network based on P2P principles. It has the promise to be ‘the next big thing’ after, say, HTTP.

    (BTW, if you are serious about wanting to actually build the infodex? Please be in touch.)


  • @Thomas, thanks a bundle for the technical fill in. I am definitely clear on the collaborative element of Wave (I am in the beta program) but a core thesis of mine is the role of messaging, message payloads and payload handling mechnanisms in the evolution of web apps and services so that’s kind of where I focused.

    What will be interesting to see with Wave is if the fact that it solves the right problem is enough to drive its growth of you end up with a fractured ecosystem whereby each developer base focuses on the feature they are interested in and you end up with no core messaging service layer that ties all of these things together (ala twitter).

    Here is where simplicity is easier to scale than complexity but counter is that if Google can eat their own dog food and integrate their own services with Wave (and expose same via APIs, open source, dev tools), then they won’t fall into the CORBA trap – a 3.0 solution without a compelling 1.0 call to action.

    As to infodex, I am about a week away from having my very simple 1.0 goal, UI, workflow documented. Will reach out and get your thoughts.