Social Graph API: One small step for Google, one giant step for the Internet Operating System

Google’s announcement today of the Social Graph API is a major step in the development of what I’ve called “the Internet Operating System.” In a nutshell, what the Social Graph API does is to lower the barrier to re-use of information that people publish about themselves on the web. It’s the next step towards the vision that Brad Fitzpatrick and David Recordon outlined in Thoughts on the Social Graph.

How many of us feel frustration when we’re asked to recreate the same information for each social network? The Social Graph API allows sites to query public data that Google’s crawl has exposed about individuals, so that it can be re-used rather than recreated. More importantly, it is based on a vision consistent with what Dick Hardt calls user-centric identity. It uses XFN and FOAF, so that you can mark up information that is about you. When other sites contain information that claims to be you, a link from one of your sites that contains an appropriate XFN or FOAF assertion will help the Google crawler to aggregate that information about you.

Google is also providing a simple demo application that can best be thought of as a kind of Social Graph debugger. It’s not live yet as I write, but yesterday Brad Fitzpatrick sent me a copy of a page using me as the root of an Social Graph search.

As you can see here, Brad put in two sites that he was pretty sure were mine, and The API then returns a list of sites that it infers are also mine, either because I’ve linked to them in a way that indicates they are, or because someone else has done so. So, for example, it concludes that is also my site because, a “me” site, links to it.

But other sites shown in the second block below are ones that represent only one-way assertions. They may link to me in a way that suggests that they are mine, but since it’s not reciprocal, they aren’t yet aggregated as “me.” (That’s why I called this demo a kind of social graph debugger. It allows you to see what Google knows about your social graph, and then to correct it.)

To include my alternate repositories of identity in my Google-crawled social graph, all I need to do is link to them from a site that is known to be “me.” So, for example, if I want to claim my Technorati profile, or my Berlin Web Expo Crowdvine profile as me, all I need to do is to link to them with an XFN “rel=me” tag. Meanwhile, I can also see that Google notes that Sarah Milstein has also linked to, but because she isn’t me, I won’t confirm that as a “me” link. And of course, I can see that Google doesn’t seem to know about my public Flickr photo pool, or my public linkedIn or Facebook profiles, or various private profiles like the ones I have on dopplr or goodreads.

The Social Graph API cleverly bootstraps around the problem that not many sites use XFN or FOAF markup by allowing the API to assume that the root supplied to the API query is a “me” link.

Now, this demo application is just the beginning. The real goal is to have a site that is asking me for profile information be able to query the social graph to find out where that information already exists.

It isn’t yet comprehensive enough to give those sites fine-grained access to individual profile elements, but you can imagine that that’s where it’s going… (It would also be great for it to have mechanisms for querying truly private data. The Unix permissions mechanism of “user,” “group,” and “world” is a good model — right now, this service traverses only the world-readable social graph. It would be cool to have mechanisms to rupport “group” and “user” level access as well.)

image from google showing social graph flow
A bit more detail from Google’s site and information they provided (click on the image to enlarge it):

The Social Graph API looks for two types of publicly declared connections:

  1. It looks for all public URLs that belong to you and are interconnected. This could be your blog (a1), your LiveJournal page (a2), and your Twitter account (a3).

  2. It looks for publicly declared connections between people. For example, a1 may link to b’s blog while a1 and c link to each other.

This index of connections enables developers to build many applications including the ability to help users connect to their public friends more easily. For example, in the image below, Brad just joined Twitter but has no friends on it. Using the Social Graph API, Twitter could provide Brad a way to find out that his friend Jane is also on Twitter. Here’s how: Brad has linked to his homepage (b3) from his Twitter profile (b1) and also from his homepage (b3) to his LiveJournal blog, Bradfitz (b2). On LiveJournal, Brad is friends with Jane274 (j2), but Brad doesn’t know that Jane274 (j2) also has a Twitter profile (j1). Since the Social Graph API has indexed that Brad and Jane already have declared a public friendship on LiveJournal, it can let Brad know that he might want to add Jane (j1) on Twitter as well.

This is REALLY cool, even though it’s just a beginning. As I said in the title, a small step for Google, but a huge step towards the internet operating system. Most importantly, it’s a step that doesn’t put the social graph under the control of any one company, but instead provides mechanisms for the user to control what information about him is available to applications.

Until now, I thought that Amazon was the only company that understood the difference between application level platform APIs and true internet platform services. (To understand the difference, compare Amazon web services like the Amazon Associates Web Services to much broader services like S3 and EC2.)

Facebook’s F8 platform, like other first generation Web platform APIs that have taken the world by storm, such as Google Maps, are too tightly integrated with the originating company’s own application to enable true distributed innovation. But a real platform service makes it possible for developers to do things entirely outside the realm of the originating provider’s application. Unlike OpenSocial, which I found disappointing, the Google Social Graph API is a game-changing play in the social networking space. It’s a huge step towards open standards and a level playing field in smart social apps, and exposes Google’s data and infrastructure in a subtle and powerful way. I can’t wait to see what comes next!

  • Alexander Sicular

    As the largest broker of publicly available data, Google is uniquely positioned to leverage that data in this manner. I applaud them for taking a major leadership role in this arena.

    Other organizations would do well to follow suit. All that would be required would be for brokers of public information and owners of private information conform to Google’s logically crafted request methods and return responses.

    One api to rule them all… hehe.

  • Tim,

    I see this as a pure VRM application. VRM as in Doc Searls …:)



  • This will be exciting news once Google commits to fixing their data leak/privacy violations in turning algorithmic contact creation into “friends” inside Google Reader. This exposing of shared feed data is a violation of the contract with users, and contacts of this type “shared” through open standards or any other mechanism polutes the pool. Googleneeds to restore their track record of caring about user privacy before they can be taken seriously in this new, potentially important effort.

  • A few problems with this:

    1. The original proposal Brad made is very specific that this data is not to be at a single corporate entity, its supposed to be a non-profit foundation or something people can have at least a little bit more faith in.
    2. It doesn’t address a lot of privacy concerns that were brought up on the Social Network Portability list.
    3. As many people pointed out on the Social Network Portability list, people choose different people to connect to on various social networks. You might not want your boss from your professional network to see you as a possible ‘friend’ on the local Nudism social network cite. There’s no user control in this API, its the web service that gets to dig around on you.

    I argued quite a bit on the social network portability list about the need to respect privacy, and was largely ignored with the belief being, “It will all be assembled someday, lets go ahead and do it ourselves!”, or alternatively, “Some crawlers can get a bunch of this data already, so its already public”.

    In the end, it appeared that at least David Recordon was a bit more interested in privacy, and the solution he seemed to propose was to place the social linkups in the hands of the users, and you could then decide what to share with a new website when you joined up. Notice that not only does this put the data in your hands, but it also changes who gets to determine what is shared with a new service. I consider this a superior approach from a privacy perspective, that still lets you have the same cool linkup features you detailed when people sign-up. Why on Earth would you choose substantially less privacy and personal data control for the same features?

    With the original social graph, as with this one (so far as you’ve mentioned), services can decide to look people up and get data at will, with no control of your own privacy. I find this a very disturbing thought, as I consider it a gross violation of my personal privacy, and as a previous poster has mentioned, Google doesn’t have the greatest record on personal privacy.

    Of course, all of this is just one trend I see where geek’s are fascinated with doing ‘cool things’ with little thought about the practical implications of their efforts in other ways on society. Scientists learned from the Manhattan Project that the practical consequences of some branches of scientific research could be devastating on levels they hadn’t imagined, which brought a lot more thought to the field on Ethics in Science. A Social Network API definitely isn’t going to cause mass destruction, but any venture with such wide ranging privacy implications definitely needs a little more thought put into it, preferably from some privacy experts in this case.

    Did Brad Fitzpatrick or any of the geeks working on this social network stuff ever consider privacy implications or talk to some privacy policy experts? Did they think about how they could possibly attain similar levels of functionality without putting so much private data into the public realm? It’s still Big Brother, even if it isn’t the government in the role this time.

    It’s time to stop thinking merely about ways to structure data for computers, and consider ways to structure data for computers that retains personal control over personal data.

  • It’s also worth taking a look at the types of relations Google is hoping to collect. Think about what an amazing amount of previously private data you would be opening up, all for some possible convenience on another social network site.

    Do your friends know every single contact you have, and their relations? Would you like them to? Surely someone isn’t going to provide the same sorry answer the government gives you when they explain why you should sacrifice privacy for security, “If you don’t do anything wrong, this shouldn’t concern you.” Let’s remember that it takes the police a decent amount of time to try and find every contact you have, and their relation…. and now we’re expected to put it all out there? Just to save a few minutes finding friends on a new social network??

    Bruce Schneier has an excellent post about security vs privacy that I’m sure is plenty relevant in the context of “privacy vs programmer convenience”.

  • Ben —

    I’m well aware of the concept of tradeoffs between privacy and security (and convenience), but I fail to understand how what Google is doing is any worse than putting out all this information on Facebook. Yes, Google will have access to this information, but so will anyone else using the API. And Google isn’t crawling or exposing private data, only data that’s already public on the web. Meanwhile, Facebook and other hub-based social networks have access to lots of data that you consider private.

    While there are lots of details to work out, it seems to me that there is more potential for user control and empowerment here than in other models.

  • Why should user control be the last thing on people’s minds? Why isn’t it the first? OpenID is about empowering individuals, they get to control what provider they use and when registering with OpenID you get to decide data is shared, why couldn’t a social graph system like that be developed? Strange how Google needs to have all this data….

    The fact that they jumped to the least private form first, is definitely a bad thing. If the government decided to collect this data, and refine it with all these relations, people would be up in arms. Why do you cheerlead this effort to accumulate, expose, and encourage the full public exposure of so much private data?

    On FaceBook, this data isn’t all public. I can’t go to someone I don’t know, and see all their friends, and how they know all those people. I can’t even see detailed relation data for friends of my own friends on Facebook. There’s definitely a level of privacy there. But just pointing a finger at some company that may not care about user privacy, and saying that cause they exposed it, it should be exposed, is definitely not the right answer.

    Again, the fact is, this could be designed with user privacy and control, but it wasn’t. It wasn’t even based on the social graph paper Brad wrote, as he clearly states in point 1a, Establish a non-profit and open source software (with copyrights held by the non-profit) which collects, merges, and redistributes the graphs from all other social network sites into one global aggregated graph..

    OpenID is catching on, why can’t we have a distributed system where users keep their data private and share it when they want to? Oh yea…. Google would sure like all that data. Notice that they also get to control how you access it, how much you access it, etc. All of this clearly violates Brad’s first tenant in his Goals for the Open Social Graph as well. Why won’t you call him on it, and why won’t you pitch for a private version of this that does respect privacy?

  • I agree that Social Graph API has more potential for user control and empowerment than in Facebook like closed models, but here I think Google uses brute force.

    We know that all that public FOAF, XFN data are collectable and servable through an API (I guess mybloglog has been doing it), but it is a huge cost to index and serve all that data. Should we applause this informational brute force achievement or should we focus on open solutions for the same problem?

  • I’d say this is an advancement of the field, but keep in mind, there are alternatives coming. I think decentralized discovery has a large role to play here, and as some above sentiments echo, is crucial to a decentralized social graph based internet.

    Personal data indexes that protect, share, and link our data stores is another approach, and thats something the data portability workgroups are working on. I appreciate what google is doing here, but at the same time, I think (as Doc Searls has said) we gotta protect the data and control who can see what. How do we do that? Well FOAF alone aint gonna get it. A protocol for querying your personal index (where you have data stored, service endpoints – XRDS) is needed, and ways to allow third parties to discover, query, and view this index securely on your behalf is also needed — and only allowing certain parties certain levels of access, completely opt-in, is critical.

    Theres other work in this space, namely Messina’s DiSo, some initiatives (WRFS and wNode), and how that works with linking existing data, but in a more decentralized manner.

  • @Josh: That’s exactly the kind of stuff I’m in favor of, and that’s how the graph should be handled. With people’s data in their control, and users decide what to share with services.

  • @Tim: Kaliya also has some thoughts about social network portability and why control and privacy are so important. These must be the first two most important tenants of social network portability, not afterthoughts and things that may or may not make it in.

    The worst case scenario would be if lots of sites started using the Google Social Network API, because that would make it the big first-mover that others have to eclipse. Plus, it would continue a process of dumping information that should be private, into the public domain, when there is no need to. A little patience, and as Josh noted, there are private methods to do the same thing coming along well, where you retain control, not Google.

  • Ben is right on the money here. The problem with this system is all about user expectations. No one expects that all their social connections on the Web are suddenly available to anyone who wants it by using Google’s Social Graph API. What if a person doesn’t want to be found in a certain place? The existing, user-controlled technologies mentioned in the comments above make that possible, but the Social Graph API is a blunt, indiscriminate instrument.

    Just like users shouldn’t have to manually enter in their social connections at every new site they join, they also shouldn’t have to worry about slipping up somewhere and letting the world know about a connection they didn’t want exposed.

    I predict the same backlash to this as was experienced by Facebook’s Beacon program.

  • how is this better than me just uploading my address book into a new social app that I am using?

  • We we trust Microsoft if they did this ?
    Why should we trust google to be the company that has our information on their servers ?

    This should be a non profit effort and everyone know this.

    Maybe they should ask us if we want our data on their server….How come this part is not opt in ?

    Can I opt out of having my info included in the api.

    Is it right to assume that because I have an fof file some where that I want it to be crawled and added to googles data base.

    Has anyone asked any user what they want in the way of their data being portable ?

    There is no way that this can be good for anyone but google.

    Since this is supposed to be an open initiative….did goog ask yahoo, myspace, or facebook for their input….No they did not…This is a unilateral cooperate move to try to corner the marker for individuals private information without their approval…

    Also…goog….nice timing on the pr…A friday when most will not be paying attention to this Evil idea….

    Can someone let me know when the next standards meeting for the goog open social graph will be ?

  • I see this somewhat two-fold. Of course Google does not take privacy into account here and the argument that a) that data is publically available anyway or b) people do the same on Facebook do not count to me.

    a) because linked data is always more than just the data itself (and searchable on top)

    b) because people choose to do so on their own. Sometimes maybe withough knowing what they do as the Facebook/Scoble incident showed but still.

    I also see this as a good thing though as it pushes the idea forward (both data portability and microformats/FOAF). Of course having this located at Google might be some sort of a problem.

    I personally had the feeling anyway that microformats are only really useful in a search engine context anyway as they seem all about indexing content which is maybe harder to do for my personal little project.

    If you look at the usecases for portable social networks it might be not so much a scenario in which you need to scrape things. You know the endpoint and the endpoint should know all you need to know.

    BTW, these are also on the wiki (extended by others) and I hope for more people to contribute use cases as I think we need to define first what we are actually trying to solve.

    Also clear from these usecases should be that user control is somewhat important to me.

    Thus it’s also good that there is the Data Portability Policy Group which hopefully comes up with sensible policies for privacy and other things.

  • I reckon this is really nice step forward (with one or two caveats on certain aspect of their approach).

    While “Internet Operating System” is a good metaphor up to a point, I’m not sure about its legs. Traditional operating systems are all about interfacing between different components, people, disk drives etc – and by definition an API is an interface.

    But what we are starting to see, with a gradual shift into a Web of data, is a *reduction* in the significance of interfaces, and lower impedance between the components. The Giant Global Graph is a projection of an integrated system – my online social graph increasingly reflects my real-world social network (ok, I should get out more).

    With all due respect to the previous commentators, privacy is orthogonal issue – reuse of *public* data seems a completely positive thing.

    Hopefully the fact that Google are looking at FOAF will prompt people to notice how far developments have progressed around other Semantic Web technologies. My cats have profiles too.

  • Danny,

    regarding privacy people might see this different.

    If they add followers on twitter they do not really know that there will be some XFN markup in the page which then can be indexed by Google and used by whoever.

    I still think that these things need to be discussed more.

    (I also see these problems in other places as in Second Life where lots of information about avatars can be collected and analysed without people knowing about it. And you don’t even have cookies there which you could disable.)

  • Christian —

    Agreed. This was the whole point of my post. Make stuff visible, and people either accept it or reject it. Leave it invisible, and it creeps in and corrupts.

    Do you guys really think that people aren’t already using this data?

    I’d rather have access and control to it myself than just have it visible to crackers and spammers.

  • Of course people do use this data already and this is another reason to start discussion and maybe start it with new tech early before such patterns emerge.

    But I guess we agree :-)

    Now looking forward to what the policy group comes up with (actually I should be more active there) and then the next question is if companies will follow… might mean some change in business plans, too.

  • Alexander Sicular

    What a number of naysayers here fail to realize is that the data being exposed here is already publicly available. If you take a moment to peruse the documentation, Google clearly indicates the XFN and FOAF standards as their mechanism for culling publicly available data off the web. If you do have a problem I suggest you address it at the doorstep of XFN and FOAF.

    The api exists to make using that data easier for everyone. Anybody with the knowhow could duplicate this feat with their own time and resources. Privacy controls exist beyond the realm of this api as, again, the api pulls publicly available information.

    For those concerned with their various social circles converging in undesirable ways I suggest multiple online personas.

  • I am not critizing Google here. It’s more that there are policies which give a hint on what a company should tell it’s users (and even let them decide upon) how their data is used or exposed.

    Twitter users might not be aware that their friends links are marked with XFN. They think these are a local thing.

    It’s of course also about perception.

    I also do not see it that dramatic I just think a discussion is what is needed.

  • Christian, I agree that perceptions are important, but maintain that enabling reuse of public data is orthogonal from the question of what data is made public in the first place.

    Unless we’re clear about the difference, there’s still likely to be a tendency to shoot the messenger – or worse still, the implication that sharing (public) data is somehow wrong.

  • Just taking the Twitter example one step further, perhaps we need to develop opt out (or opt in) systems, where e.g. a Twitter user can opt out of XFN or FOAF or whatever mechanism is being used. The part I don’t get is this. If you don’t want your public data and relationships shared, don’t use Twitter, just stick to IM. It’s a very simple argument

    One possibility is using something like openid as a central hub for declaring relationships and whether they are public or private. I wonder what people think about that scenario.

  • This fairly tale is about a girl called Little Red Surfing Hood. The girl one day is surfing through the Internet and through some of her social network profiles. She actually plans to visit her grandmother later that day. All of a sudden, a message pops up on her screen from a user with the profile name “wolf”. He plans to discover as much about the girl and her grandmother as possible and pretends to be a very cute boy going to the same school as Little Red Surfing Hood. He asks all kinds of questions, sends her a link request and asks for a picture of her. She naïvely links him to her profile and also sends him a picture. Little Red Surfing Hood is really happy to have made a new friend. In the meantime, wolf assembles various credentials about Little Red Surfing Hood making use of details revealed on her profile, pictures, and personal details such as her address, Email, a mobile phone number and the route she always takes to go to her grandmother. With those credentials, he goes to the girl’s grandmother’s house, gains entry by pretending to be the girl. The rest of the fairy tale is known. The wolf eats the grandmother and later on the girl as well. There are different story endings how the girl and her grandmother were rescued by a hunter. But that is not the point of the story here. We don’t even know who the hunter was, how he found out about the wolf in the grandmother’s house and for what purpose he was into this.

    Information privacy in social network applications means more than allowing certain people to see your profile. First, it means to have a way to really know the true identity of the person talking or connecting to you. Secondly, it means to have full transparency over what you and what others are doing. What happens if you upload a certain piece of data to your social network profile? Who can see which part of your profile data, who is connected to it and who uses your data for which purpose? This, of course, includes the service providers and any third parties (including the hunter). And finally, it means to determine for yourself, how your personal data is linked, exported, assembled and analyzed in which context and by whom.