Getting closer to the Web 2.0 address book

The answer to a long-running problem lies in data, not an application.

Tim O’Reilly has often asked a good question: seeing as my phone and email
know who I communicate with, my company directory knows who I work with,
and Amazon knows who has written books for my company, why can’t these
various things somehow combine to automatically deal with friend requests
coming from social networks?

This specific problem is just one example of a wider issue. Given that so
much diverse and overlapping information about each of us is spread between
many applications, why are seemingly simple things based on combinations of
information — like automatically reacting to known friend requests — still
not possible?

Tim summarizes this by asking “Where is the web 2.0 address book?

The traditional reaction to Tim’s need would be to begin work on the design
of a new application to provide the suggested features. This might give
rise to an online social address book incorporating his specific
suggestions, along with anything else the application designers might dream
up to provide additional functionality.

While such an application might be cool and work well in the short term, I
believe that approach is doomed to failure. The main reason is that we
can’t anticipate the future. Baking decisions about what to support or not
into applications is almost certain to leave us wanting
and needing more. At that point we’re beholden to application designers’
willingness, priorities, and ability to further adapt.

What happens when Tim comes up with another idea, or when you or I do? While an application might be designed to be open — for example, by providing an API or a plugin
system — at the end of the day it’s just another application sitting in
front of its collection of data. This problem is particularly severe in our
specific case: Tim would ideally like the incorporation not just of his
phone’s data and his email, but specific information from the O’Reilly org
chart and from Amazon. That level of specificity — or personalization, if
you prefer — is well beyond the interests of anyone writing a general phone book application.

For these reasons, among others, I believe the traditional “let’s build a
cool new app” reaction to Tim’s dilemma is the wrong approach.

The other answer

An alternate answer begins with a sharp departure from this thinking. I
suggest that the Web 2.0 address book need not take the familiar shape of
an application. Instead, like a physical address book, it might be mainly
about the data. That is, centered in a common data store through which a
variety of loosely coupled applications interact with users and with one
another.

In such a world, a friend request to Tim would be added to a shared online
data store. There it would be noticed by an application running on a mobile
phone, a periodic script that scans Tim’s inbox, or an internal O’Reilly
script with access to the company’s staff list. Any application recognizing
the requesting user’s details could add information to the request object
to indicate recognition. The initiating application could detect this and
act on it.

Several aspects of this idealized scenario are worth close consideration:

  1. Such a system would require an underlying storage — a “data fabric”
    as Tim has called it — that was shared not just for reading, but also for writing. The shared writable data fabric would be open to future applications. They could contribute or consume information as they saw fit, without requiring the permission of existing applications, possibly without knowing or caring about each other’s existence. Future applications would interact seamlessly with existing ones by simply following established data conventions.
  2. While any application should able to join the fun and contribute, a permissions system would be needed to protect existing information and to selectively share it among authorized applications. For example, Tim might authorize an application on his phone to add information about numbers he has dialed or email addresses he has been in contact
    with. He could authorize another application — in our case an automatic friend request resolver — to read that information and to create more information on his behalf to indicate existing friendships.
  3. This data-centric answer to the “Where is the web 2.0 address book?” question points to a wide class of future applications which cooperate and interact not through synchronous pre-ordained API calls, but via asynchronous data protocols which follow open-ended conventions. This is in strong contrast to applications which hold information in databases with relatively inflexible structure behind APIs that strictly delimit possible interactions. Applications using a shared writable data storage can adopt conventions and data protocols by which they cooperate to achieve joint actions. Applications that operate in this way leave the door open for new conventions, for additional unanticipated data, and for future applications that adopt the existing conventions, or introduce new ones.

At Fluidinfo we’re building a shared online “cloud” data system, called FluidDB, with the characteristics outlined above. FluidDB aims to usher in a new class of applications, as described above, in which data can be
thought of as being social. The data allows diverse
applications to interoperate and communicate asynchronously using shared
conventions. [Disclosure: Tim O'Reilly is an investor in FluidInfo.]

In FluidDB, all objects may have information added to them by anyone or any application at any time — with no questions asked. Whereas a more traditional approach locks down entire database objects, FluidDB has a simple and flexible permissions system that operates at the level of the tags that comprise objects. Objects are always writable, including by future applications and by idiosyncratic applications that know, for
example, how to access the O’Reilly staff list.

To put things in a more general light, Tim’s “How ridiculous is this?” complaint (see image, below) is symptomatic of a wider problem. It highlights the awkwardness of a computational world in which applications keep their data behind UIs and APIs that are designed for specific purposes. These prevent augmentation, sharing and search, and they ultimately prove inflexible. It is ridiculous that even simple operations combining information in obvious ways are still beyond our reach.

How ridiculous is this slide
Slide from Tim O’Reilly’s “What is Web 2.0?” presentation at Web 2.0 Expo Berlin 2007.

Relief does not lie in the direction of more applications holding more data behind more APIs. It lies instead in allowing related data to coexist in the same place. Both Google and Wikipedia have demonstrated, in very different ways, the massive value that accrues from co-locating related data. There’s a real need for a Wikipedia for data. Writable and with an
object for everything, like a wiki, but also with permissions, typed data, and a query language to make it more suitable for applications. Such a store can hold the data, the metadata, support search across these, and provide a locus for applications to communicate.

The Web 2.0 address book then becomes a ball of data, not an application. Something we access as we please, using a collection of tools that individually suit us, and without any application having the final word on what’s possible.

Related:

tags: , ,
  • http://www.twitter.com/rurikbradbury Rurik Bradbury

    But who owns and runs the central datastore?
    Why should they be trusted?
    Who foots the bill, and how?

  • http://www.activewords.com/ Buzz Bruggeman

    You should look at what my friends at http://www.gist.com are doing.

    I think the quality of their product/work/thinking is exceptional, and I would invite you look at what they are doing and more than happy to introduce you to T.A. McCann their CEO.

    Regards,

    Buzz Bruggeman

  • http://picup.com PICUP

    Terry, you should really check out http://picup.com, The Personal Internet Communications Unification Project. It’s a really well thought out stab at the problem you describe in this post. Be sure to check out the whitepaper at https://picup.com/downloads/Whitepaper.pdf

  • matt silver

    It seems to me that any system that will manage this data still needs to have a central repository for all of the data, and then needs to be able to sync its contents to a variety of clients and devices.

    If I had to compose a wish list for a simple* contact management server/client setup it’d be:

    * sync to gmail reliably
    * sync to Outlook and Address Book in Mac OS X reliably
    * sync to Android and iOS reliably (either via gmail above for Android, and maybe “exchange” or CardDAV on iOS)
    * Allow for delegation
    * better de-duplication tools allowing for keying off of any of the different fields in each entry.
    * maybe even keeping track of the date the contact was first added, so that old data can be archived and deleted.

    By simple contact management, I mean that there seems to be a need for something that will handle thousands, if not tens of thousands of contacts, but doesn’t need to have all the features of existing CRM systems like Salesforce. Something that has name, email address(es), mailing addresses, phone number, chat names, maybe a notes field etc. No need for lead generation tools or meeting scheduling within the address book device.

  • http://rdenaux.com Ronald Denaux

    Congratulations to Terry Jones for two very clear and insightful blog posts.

    @Rurik, those are the right questions to ask, and I can see two possibilities for setting up Terry’s vision of shared data. The two possibilities don’t exclude each other:

    1. A central (or federated) service such as FluidDB, which specifies the conventions and services for writing and reading data, as well as the permission system.
    2. A set of technologies that can be used to publish, read and augment data. More specifically: semantic web technologies such as RDF, SPARQL, OWL, etc.

    The advantage of the central service is that it is easier to have a central repository. For example, at the moment it is arguably easier to get started publishing data in FluidDB, than it is to publish RDF data. Although this is changing, for example, Drupal 7 automatically publishes an RDF version of your data.

    The advantage of semantic web technologies is that anyone can publish their own data wherever they want on the web. Then, anyone can access your data and extend your data on their own server. However, because there is no central service, you need to find and install your own semantic database and you need semantic web search engines to find who is extending your data and how. I am not aware of any open semantic database service where you can publish your own data.

    As I said, the two approaches are compatible: for example, FluidDB uses URIs to refer to objects, just as RDF does.

    Some other differences between FluidDB and semantic web technologies are:

    • SPARQL, the query language for RDF is much more advanced that the query language for FluidDB.
    • Also, RDF databases provide ‘reasoning’ services: they are capable of inferring new, implicit data from the existing data. Again, this is not possible yet with FluidDb.

  • http://picup.com Binyamin Bauman

    Terry, great post. PICUP (Personal Internet Communications Unification Project) is a really well thought out stab at addressing the problem you are describing from the perspective of an underlying data infrastructure. Check out the white paper at https://picup.com/downloads/Whitepaper.pdf

  • http://blogs.fluidinfo.com/terry Terry Jones

    @Rurik

    Those are all good & hard questions.

    Owns & runs: I’m not sure. I would have loved to have been building FluidDB in
    a non-commercial way, and I tried for some years to take that path. For now
    Fluidinfo will necessarily be seen as a commercial effort and
    regarded with some suspicion / doubt. That seems appropriate & right. From that
    (cautious) perspective one might regard FluidDB as something that points in the
    “right” direction.

    Why trust: I think, at least in the case of FluidDB that it should become open
    source and (hopefully) federated. The first is relatively easy to do but imposes
    a high overhead on a tiny company. The second is technically very hard. Another
    component is not having a horse in the app race. I.e., being agnostic about data
    unlike (for example) Twitter, who can be regarded as both a platform and an app.
    At some point Twitter as an app begins to eat the ecosystem, as Chris Dixon
    predicted. If you’re providing an information storage architecture, you want to
    provide a level playing field – I think that’s very important. If not, there is
    good reason not to trust the provider.

    Who foots the bill: I think the answer to this is that the end user should (and
    will foot the bill). That’s an answer that’s still 5(?) years away. I.e., if
    a developer writes a successful app (e.g., flickr), should they necessarily
    foot the bill for all the data storage? Why can’t users pay for their own data
    storage? FluidDB gives a way for this to happen – the user can continue to own
    their own data. That’s great for the user, and can come with responsibilities,
    such as paying for the storage. I believe that’s the future of data (paying
    for your own storage may of course be subsidized): apps don’t have the last
    word on our data (which, after all, they are only storing on our behalf), and
    we retain control over who can access it, etc.

    Thanks for the questions – none of them have easy answers. I can’t even pretend
    to have answers, but it seems very clear to me that shared storage has huge
    advantages over storage of data in silos. If we can at least get that right
    I’ll be happy.

    What do you think?

    Terry

  • http://blogs.fluidinfo.com/terry Terry Jones

    @Rurik

    I meant to add that SimpleGeo are pointing the way on paying
    for your own data storage. They let you provide your AWS
    credentials and then store data into S3 for you – which you
    pay for.

  • http://blogs.fluidinfo.com/terry Terry Jones

    @Buzz

    Thanks for the pointer to Gist. I’d love to be introduced.

    Terry (at fluidinfo com)

  • http://blogs.fluidinfo.com/terry Terry Jones

    @PICUP @benyamin

    Thanks for the pointer – I’ll check it out. picup.com is not
    working for me right now though. I’ll give it a
    try in the morning.

    Terry

  • Cameron

    Terry, thanks for the interesting articles.

    The description of FluidDB made me think of a web-based Tuple Space – would that be an accurate characterization?

    In a previous project some years ago, we found tuples to be a very powerful and surprisingly flexible approach for many tasks – albeit on a local network.

    A web-scale implementation that addressed the single-point of access issue would be very interesting indeed I think. Good luck!

  • http://blogs.fluidinfo.com/terry Terry Jones

    @Cameron

    Hi. I think there are some key differences from what I understand to be a tuple space. The tags (i.e., tuple elements) have values, there
    is a permissions system on tags (i.e., you have within-tuple perms),
    and there’s a query language. Tag values in FluidDB can be anything,
    including images or PDFs or ints, etc.

    I guess a high-level summary is to say that you get the kinds of
    flexibility that things like tuples, RDF, and even XML give you, but
    with a native permissions system within the object, and no object
    owners. I wonder if that makes sense (and BTW I don’t know enough
    about the details of the various tuple space projects to be entirely
    sure I’m being accurate).

    Thanks for commenting – I’d be happy to discuss more, either here or
    in email (terry fluidinfo com).

  • David Semeria

    Hi Terry,

    It’s great to see you making good progress. Your vision of a openly writable world is still slightly beyond beyond most peoples’ horizons – and that’s a very good thing.

    I’ll close with my usual leitmotif: don’t put too much faith in self-organization. Fluid DB will live or die by its ability to promote continuous and elegant normalization.

  • http://www.bootstrappingindependence.com Boots

    I am sure people would use this kind of service, but what about the data privacy? With nearly daily articles in the Journal detailing how this disparate information is being gathered already, the last thing I want is giving some application the ability to scan my phone, email, and social networks to find more connections…

  • http://blogs.fluidinfo.com/fluidDB Terry Jones

    Hi @Boots

    I agree that privacy is a big deal (of course). The way we handle it in FluidDB is that permissions apply to the tags on objects, and can be granted on a per-tag per-application level. So you might install and run (on your phone) an application you trust and grant it WRITE permission in FluidDB for a tag like boots/i-know. You could also grant READ permission to a social-networking application that wants to be able to read that tag on objects in order to be able to finalize friend requests. With fine-grained permissions granting (and revoking) at the level of the tag and application, you can be quite particular about restricting information access, using a “need to know” model. Tools like OAuth and the specific listing of required permissions by Android apps are making that sort of approach more commonplace.

    I don’t mean to claim that FluidDB has the entire solution – I doubt there will ever be one. But having objects that have no owners is a good start because it moves us away from the more traditional all-or-nothing access rights over digital objects.

    Another reason for hope (I think) is that along with improving methods for being particular about permissions granting (and revoking), is an increased awareness of the importance of reputation and trust in the world of applications. Applications that misbehave can be detected and disabled. That may not be much solace in any given situation where things have already gone bad, but as an overall trend I find it encouraging.

    Thanks for the comment / question.

  • http://blogs.fluidinfo.com/fluidDB Terry Jones

    Hi @David

    Thanks :-) We’re getting now on having some of our attention (mine, principally) more looking outward and less on engineering. So I hope we’ll start to drive data and apps, and also will push things in useful directions (re normalization etc.).

    Terry

  • http://networkhippo.com Scott Annan

    Great post, you should still check out http://networkhippo.com – maybe an opportunity for collaboration? I think we’re the middleware for pulling content together without the “lock-in”, but you make a very compelling argument!