RSS never blocks you or goes down: why social networks need to be decentralized

Recurring outages on major networking sites such as Twitter and
LinkedIn, along with incidents where Twitter members were

mysteriously dropped for days at a time
,
have led many people to challenge the centralized control exerted by
companies running social networks. Whether you’re a street
demonstrator or a business analyst, you may well have come to depend
on Twitter. We may have been willing to build our virtual houses on
shaky foundations might when they were temporary beach huts; but now
we need to examine the ground on which many are proposing to build our
virtual shopping malls and even our virtual federal offices.

Instead of the constant churning among the commercial sites du
jour
(Friendster, MySpace, Facebook, Twitter), the next
generation of social networking increasingly appears to require a
decentralized, peer-to-peer infrastructure. This article looks at
available efforts in that space and suggests some principles to guide
its development.

Update: a few days ago, OpenID expert Chris Messina and
microblog developer Jyri Engeström published

an article with conclusions similar to mine
;
clearly this is a felt need that’s spreading across the Net.
Interestingly, they approach the questions from a list of what
information needs to be shared and how it needs to be transmitted; I
come from the angle of what people want from each other and how their
needs can be met. The two approaches converge, though. See the
comments for other interesting related blogs.

The peer-to-peer concept

The Internet was originally a parliament convened among peers. Every
host was a server, almost always providing file downloads and usually
email as well. To this day, ISPs “peer” when they accept data from one
ISP’s customer and delivers it to the other ISP’s customer.

To peer doesn’t mean simply to be of equal status–in fact, that
notion could be misleading, because two systems with vastly different
roles and resources can peer. More importantly, to peer means to have
no intermediary.

When the architecture requires an intermediary, it should play as
unobtrusive and minimal role as possible. For instance, Napster and
Skype have central servers, but they are used just to sign up
participants and set up connections among them.

Napster’s and Skype’s partial decentralization won them a key benefit
of peer-to-peer networking that Twitter could well take note of: they
offload most traffic from their central servers to the users and the
ISPs that connect them.

But being partially centralized means the service can still be
disrupted as a whole. Napster was shut down by a court ruling; Skype
shut itself down once through a programming error that it never
clearly explained to the public.

The Internet itself quickly developed into this hybrid model as well.
Modems and terminals created a new layer of second-class citizens,
vastly expanded by the PC revolution. These Internet users were tucked
away behind firewalls and blocked from using any services not approved
by system administrators.

By the year 2000, new companies springing up in the dot-com boom found
themselves frustrated by these restrictions, and designed their
innovative protocols to deliver data over port 80 because everybody
kept that open for Web traffic. When the practice started, traditional
Internet developers derided it as “port 80 pollution.” Now it’s called
Web Services.

As happens so often, the way forward proved to be the way
backward–that is, to restore the democracy of the early Internet–and
also predictably, was pioneered by outlier movements with dubious
legality, ethics, and financial viability. Napster made the first
impact on public consciousness, followed by services that rigorously
avoided any hint of central servers (see my 2000 article,

Gnutella and Freenet Represent True Technological Innovation
).

By the end of 2000, the term peer-to-peer had become a
household word. But the movement quickly went into retreat, facing
difficult design problems that were already under discussion in the
O’Reilly book
Peer to Peer,
published in February 2001. I summarized the problems, which remain
ongoing, in the articles

From P2P to Web Services: Addressing and Coordination
and

From P2P to Web Services: Trust
.

The issue of addressing would arise right away for a social network
developed in a pure peer-to-peer fashion. How would you check whether
your old college buddy was on the network, if you couldn’t query a
central server? And how could you choose a unique name, without a
single place to register? Names would have to be qualified by domain
names or some other identifiers–which is actually a step forward
right there. It seems to me ridiculous that a company would plan to
provide a service to the whole world using a flat namespace. And while
we’re at it, you ought to be able to change your name and bring along
all your prior activity.

Trust would also become an issue in decentralized social networks. You
could ban a correspondent from your personal list, but you couldn’t
inform a central authority about abuse. And the problem Twitter has
recently started to tackle–preventing random users from impersonating
well-known people–would be a challenge.

But decentralization brings many benefits. A failure at one person’s
site, or even on a whole segment of the network, would have no effect
on the rest of the world. A misconfigured router in Pakistan could not
keep everyone from accessing the most popular video content on the
Internet. And because each peer would have to obey common, understood
protocols, a decentralized social network would be transparent and
support the use of free software; nobody would have to puzzle over
what algorithms were in use.

Visiting many different sites instead of central server to pull
together information on friends would increase network traffic, but
modern networks have enough bandwidth to stand up to the load. Even in
places with limited bandwidth, service would degrade gracefully
because messages would be small.

The
StatusNet
project, which underlies
identi.ca,
represents a half-way step toward the kind of full decentralization
illustrated by RSS. StatusNet can power a variety of microbloggin
services, each signing up any number of members. The services can
interchange data to tie the members together.

The rest of this article looks at two possible models for a
distributed social network (RSS and XMPP), followed by an examination
of the recurring problems of peer-to-peer in the social networking
context.

Possible models

Many examples can be found of filesystems, version control systems,
and other projects that lack central servers. But I’m just going to look
at two protocols that other people are considering for decentralized
social networking.

When thinking of decentralized systems for sending short messages, RSS
and Atom have to come to mind first. They’re universal and work well
on a large scale. And Dave Winer, the inventor of RSS, has created an
enhanced version called

rssCloud
,
recently

incorporated into WordPress
.

Given the first question I asked about decentralization–how do you
find the people you’re looking for?–the RSS answer is “by
serendipity.” Like everything else on the Internet, you could come
across new treasures in many ways: surfing, searching, friends, or
media outlets. Lots of bloggers provide links from their sites to
their own faves. And RSS has developed its own ecosystem, sprouting
plenty of aggregators that offer you views into new fields of
information.

rssCloud is meant to carry more frequent traffic and more content than
the original RSS and Atom. It maintains an XML format (making it
relatively verbose for SMS, although Winer tries to separate out the
rich, enhanced data). Perhaps because of the increased traffic it
would cause, it’s less decentralized than RSS, storing updates in
Amazon S2.

XMPP was invented about the same time as RSS by a programmer named
Jeremie Miller, who wanted a standard instant messaging protocol with
tags that could support semantics, and therefore powerful new
applications. Most important, his creation, Jabber, made it possible
for individual users to run their own servers instead of depending on
America Online or Yahoo!. Jabber had the potential to complement Tim
Berners-Lee’s idea of a Semantic Web.

Because Jabber used XML, it was seen as a bit heavyweight, and the
servers were reportedly hard to configure. But the possibilities were
too promising to pass up. So the IETF formalized it, gave it a clumsy
name suitable for a standard, and released a set of RFCs about it.
Unfortunately, XMPP languished until Google adopted it for their Talk
and Wave services. These high-profile applications suggest that it has
the scalability, flexibility, and robustness for social networking.

The P2P problems, in today’s context

Even if decentralized protocols and clients were invented, there will
be a long road to democratizing social networks. The messages are
expected to be lightweight, so photos and other large batches of
content would have to be stored somewhere outside the messages. Most
users wouldn’t trust their laptops (much less their mobile devices) to
store content and serve it up 24 hours a day, so they would need a
cloud service, which might or might not be distributed.

A backup service is also necessary in order to recover from a local
disk failure or other error that wipes out several years of your
accumulated identity.

Problems such as impersonation and unsolicited communications (spam)
are hard to solve in decentralized systems because trust is always a
hierarchical quality. This is true everywhere in life, beyond the
level of a family or neighborhood. We expect our professors to be good
because they were hired by the college, and expect the college to be
good because it was accredited by a centralized facility, whose
managers were in turn appointed by elected officials. This system can
and does break down regularly, so mechanisms for repair are always
built in.

Nobody can be banned from a decentralized social network because
there’s nothing to ban them from. But there are ways to re-introduce
enough centralization to validate credentials. For instance, the
American Bar Association could register lawyers in good standing, and
you could check whether someone claiming to be a lawyer in the US was
registered. But we wouldn’t want to take this process too far and
create a web of accreditations, because that would devalue people
whose skills and viewpoints lie outside the mainstream.

You could still check whether someone shares friends with you, because
one person’s claims of friendship could be traced back to the sites he
claims to be friends with. Someone could game the system by setting up
fake sites claiming to be people you know and linking back to them,
but this is a huge amount of work and leaves the perpetrator open to
arrest for fraud. Free software developer Thomas Lord suggests that
identity could also be verified through “a fairly shallow and
decentralized hierarchy of authentication like the system of notary
publics in physical life.”

All in all, the problems of finding people and trusting people
suggests that there’s role for aggregators, just as in the case of
RSS. And these aggregators could also offer the kind of tracking
services (who talked about me today?) and statistical services (is
Michael Jackson’s death still a major topic of conversation?) that get
business people so excited about Twitter. A decentralized social
network could still be business-friendly, because traffic could be
analyzed in order to target ads more accurately–but hopefully,
because peering clients are under the users’ control, people who
didn’t want the ads could configure their systems to screen them out.

When you set up an account, you could register with aggregators of
your choice. And whenever you connected to someone, you could
automatically register his account with a list of your favorite
aggregators, in case he hadn’t registered himself. If people wanted
control over where they’re aggregated, I supposed something equivalent
to a robots.txt file could be invented. But it’s not sporting
to refuse to be counted. And there’s no point in invoking privacy
concerns–face it, if the NSA wants to read your tweets, they’ll find
a way.

So those are some of the areas where the problems of P2P and social
networking intersect. Let’s remember that current social networks are
far from solving problems of findability, trust, and persistence as
well. I don’t check how many followers I have on Twitter; I figure
most of them are spam bots. (Apologies to any of my followers who
actually are sufficiently embodied to be reading this.)

Could
OpenSocial
be used to implement a P2P social network? It’s based on a single
object that is expected to query and update a single server. But the
interface could probably be implemented to run on a single user’s
system, registering the users or aggregators with whom she
communicates and querying all those users and aggregators as
necessary.

Industry analysts have been questioning for years whether Twitter is
financially viable. Well, maybe it isn’t–maybe this particular kind
of Internet platform is not destined to be a business. Responsibility
for the platform can be distributed among millions of sites and
developers, while business opportunities can be built on top of the
platform as services in analytics, publicity, and so forth.

Like Google, Twitter and the other leading commercial Internet sites
have made tremendous contributions to the functionality of the
Internet and have earned both their popularity and (where it exists)
their revenue. But the end-to-end principle and the reliability of
distributed processing must have their day again, whenever some use of
the Internet becomes too important to leave up to any single entity.

tags: , , , , , , ,
  • d-r

    I’m surprised that identi.ca / laconi.ca isn’t mentioned here. It’s basically an open source AGPL Twitter clone, that allows communication between instances running on different servers – basically, the same model as email.

  • http://jon.spriggs.org.uk/blog/ Jon "The Nice Guy" Spriggs

    @d-r it is – about 1/3rd of the way in: “The StatusNet project, which underlies identi.ca, represents a half-way step toward the kind of full decentralization illustrated by RSS. StatusNet can power a variety of microbloggin services, each signing up any number of members. The services can interchange data to tie the members together.”

  • http://praxagora.com/andyo/ Andy Oram

    You missed it d-r — but it’s a short paragraph about one-half way through a very long article, so that’s not surprising. I use identi.ca myself, but I admit I’m not sure where it fits in the schema I laid out. I’d be interested in what you all think of the way I called it a half-way step.

  • http://ianrosenwach.com/ Ian Rosenwach

    Very interesting, I have a similar take on it in my post entitled “RSS has the potential to be the worlds biggest micro-content social network” here: http://ianrosenwach.com/index.php/2009/09/rss-messaging-the-worlds-biggest-micro-content-social-network/

  • Chris

    “Nobody can be banned from a decentralized social network because there’s nothing to ban them from. But there are ways to re-introduce enough centralization to validate credentials.”

    True. But that needn’t be the only approach to trust in a decentralized network. Look at what Advogato does, in an entirely decentralized manner, to solve the distributed Trust problem.

  • http://workbench.cadenhead.org/ Rogers Cadenhead

    There are two errors in this article. Dave Winer did not invent RSS. The first version of the format was written by Ramanathan V. Guha and Dan Libby at Netscape.

    Also, rssCloud is not something that was just created. , the inventor of RSS, has created an enhanced version called rssCloud, recently incorporated into WordPress. It has been around since 1999 but fell into disuse — the only company that supported it, UserLand, dropped it because of scaling and support issues.

  • http://workbench.cadenhead.org/ Rogers Cadenhead

    My comment was hosed. It should read:

    There are two errors in this article. Dave Winer did not invent RSS. The first version of the format was written by Ramanathan V. Guha and Dan Libby at Netscape.

    Also, rssCloud is not something that was just created. It has been around since 1999 but fell into disuse — the only company that supported it, UserLand, dropped it because of scaling and support issues

  • http://praxagora.com/andyo/ Andy Oram

    Thanks to Ian for the pointer (a thoughtful blog) and to Rogers for the corrections. Chris: I think that the kind of aggregated reputation Advogato uses (and PGP key signing, as another example) works in fairly small groups, but wouldn’t scale to thousands of people or more.

  • http://twitter.com/steve_e Steve

    Interesting, I’ve been thinking for a while about how you could decentralize some of this stuff. As others have noted it would be difficult to fully decentralize as you would really need a system of identity validation which by it’s nature can’t really be totally decentralized.

    However, data could be. I see no reason for us to own, store and carry our social network data in the future and for services to access it on a permission basis.

    I’ve blogged about it here previously: http://23musings.com/2009/08/18/could-we-store-our-personal-application-data-in-the-cloud/

    With the prevalence of broadband, cloud storage a protocols of transfer (such as xml, rss, xmpp) surely we will begin to see services that allow data to be abstracted from the service and left with the users.

    The flip side would be everyone carrying a copy of the app themselves too and pure data transfer applying the context when users log in or request something from it.

  • Barrett

    I think a distributed social network could work if you allowed each user to have a file that contained their information which they could then host with any one of many providers. Much like the way blogs work, you could switch providers at any time by downloading your file and uploading it to another one.

    Much like RSS, the file could then be read by a dashboard-type application like Google Reader. Here’s an example of something like that: http://www.slideboom.com/presentations/58080/Google-Dashboard—Own-Your-Social-Profile-%28updated%29

  • http://example.com None

    rssCloud ties subscribers to IP addresses, and Dave Winer has said he won’t change his mind about that. Do some Googling to find the problems with that (eg: it won’t work for 99% of hosts).

    That’s the problem with “standards” that are defined by a single person.

    Pubsubhubbub it much better in that regard.

  • http://danny.ayers.name Danny

    I agree with your sentiments, but with the rssCloud stuff I believe you’re looking at marginal technology. The decentralization of the Web works through links over HTTP, not through any specific sub-technology. The challenge now is not to get news stories quicker, but to integrate the information that is available over the world.

    Sure, variations on PubSub will be useful, but they aren’t worth very much if they don’t connect well. The traditional Web protocol of “follow your nose” is where it’s at.

    The same strategies of the traditional Web can be expanded to encompass descriptions of people, places, things, concepts. Check out :

    http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/

    RSS is soooo 1999…

  • http://www.travisswicegood.com Travis Swicegood

    Andy: Interesting article. I asked the exact same questions back in 2007 when I first started using Twitter (http://j.mp/3Z6bU3). Couple of developments since that article:

    There is an open standard for decentralized microblogging aptly called the OpenMicroBlogging spec — http://openmicroblogging.org/ It lays out in spec form the exact style system I’ve blogged about.

    I really think that XMPP would be a better system. A system built on pubsub would scale much better and helps with your issue of identity as XMPP has strong identity built-in (i.e., spoofing a request from one XMPP user to another is not a trivial task).

  • mcdtracy

    FYI: “rssCloud ties subscribers to IP addresses”

    Dave changed the RSS Implementers Guide (spec) today in this regard:

    “6. domain — an optional parameter that specifies the machine that will receive notifications. If it is not specified, notifications are sent to the IP address the request came from. “

    “…don’t depend on it being the final method for implementing this functionality. It’s possible we will find deal-stoppers before nailing this down.”

    I think he’s monitoring feedback of the RSSCloud features are reacting in “realtime”.

  • http://zgp.org/~dmarti/ Don Marti

    Remember AOL, TheGlobe.com, Friendster…the history of online socializing is the story of a place getting cool, then attracting the creepy people who drive the cool users away. The key to avoiding social network lock-in is to hasten that process on the most popular networks. Join Creepy Freaks Against Proprietary Social Networks on identi.ca and get started today.

  • http://friendfeed.com/jaykul Joel Bennett

    Of course, that won’t be true, if we all start using it. Blocking will come along for the same reason it does elsewhere …

  • http://friendfeed.com/jaykul Joel Bennett

    How do we @reply or comment in a decentralized social *network*? Without that, this is purely theoretical.

  • http://friendfeed.com/jaykul Joel Bennett

    In fact, without @reply, a “social network” is really very feudalistic: the rich (in followers) get richer.

  • http://basiscraft.com Thomas Lord

    Andy, I hope we get a chance to talk more about this topic in coming weeks. I am wrapping up an “exploratory R&D” type project for the FSF to begin to stake out a web operating system for distributed, decentralized applications. We believe that just as users should be in control of their personal computers, they should also be in control of their own servers. (No, I don’t mean that grandpa has to learn to administer Apache :-). (And, I’m slightly behind schedule on that project so please don’t tell RMS I’m taking time out to make a blog comment, especially on Radar ;-)

    One of the themes of the work I’m doing for that project (working title: “W3OS”) is to create a web platform based on traditional OS concepts rather than a web “platform” based on a bunch of ad hoc APIs. For example, if your application needs something roughly like a file system, we want you to use (some parts of) WebDAV. And, if your application needs something roughly like possibly lossy but roughly real-time inter-process communication, we want you to give some serious consideration to XMPP.

    It’s with that perspective that Dave Winer’s work is, for me at least, a bit of a mixed bag of goodness and disappointment. On the one hand, It’s fantastic goodness that he’s working on building high-level apps that begin to really emphasize the P2P-ness of RSS and mimic a centralized service (Twitter) on top of that. On the other hand, so far at least, he’s just kind of “making stuff up” in his design of interprocess communication – reinventing XMPP, only poorly. (Aside to Dave, if he’s listening: nudge nudge.)

    Critiques of Dave’s work aside, it’s a rich field and I think it is the “next big thing” that converges with other trends in the industry. Yes, people are becoming increasingly skeptical about the abuse potential (and actuality) of centralized services, and about the lack of robustness (what if everyone’s email account went down at exactly the same time… that kind of thing).

    Beyond that there is a convergence with the trend towards commodity computing: the notion that (like EC2 as a schematic example) we “manufacture” server-cycles in a big factory (data center) and lease out multi-purpose slices of them — a bit like the way an electric grid or telephone network works.

    In the W3OS project we’re trying to lay the foundation of a user-respecting set of systems software for a utility computing environment. I think we have some pretty good ideas about how to do this well. I hope that in coming weeks I’ll have a chance to share some more of our state with you.

    Regards,

    -t

  • http://friendfeed.com/srw Sebastian Wain

    Every “semi-smart” early adopter thought about it many years before your post… late, too late.

    But the reality is more complex: OpenID was there, but Facebook move gain more adoption. Also, neither Facebook or LinkedIn want to open their networks for obvious reasons.

  • http://www.socialfocus.com/social-networking-software-developers.php social networking software development

    As the increase in popularity of social networking is on a constant rise, new uses for the technology are constantly being observed.

  • MikePearsonNZ

    Formal social networks can be established via moderated second level domains, with moderation enforcing rules e.g. lawyers, doctors.

    New Zealand has a new .health.nz domain for health professionals.
    http://mikepearsonnz.amplify.com/2009/09/05/new-health-domain-name-launched-healthnz/

  • http://www.p2p.tu-darmstadt.de Thorsten Strufe

    Andy, thanks for the good analysis and raising awareness for the issue. Even though I don’t agree with the statement that creating fake IDs and even some links to the respective friends would be difficult (we had a paper at WWW ’09 on the ease to do this, even in an automated fashion, on the current OSN [1] and extending this type of social engineering to a decentralized OSN would not cause that much of a headache), I strongly back the idea of using alternative (decentralized!) ways of identifying others (ask the person a question only they can answer, check their Email-address (affiliation), etc).
    The text only misses one detail: there already are quite a few approaches to implement OSN (as easy to use as any commercial OSN) based on concepts we know from the P2P world (Safebook[2] and PeerSon [3] immediately come to mind). They are nowhere near being a fully fledged alternative to LinkedIn, xing, or you-name-them, yet, but there is a movement with quite some momentum pushing in that direction (and they solve the problems of backup and availability on the by!). It seems worth to have a look and to stay tuned to what’s about to happen! ;-)

    Best,

    Thorsten

    [1] Bilge, Strufe, Balzarotti, Kirda: “All Your Contacts Are Belong to Us!” http://data.semanticweb.org/conference/www/2009/paper/56/html
    [2] http://www.safebook.us (disclaimer: the author is part of the project…)
    [3] http://www.peerson.net

  • eni

    The killer argument for p2p social networks is that you own your social graph not facebook or myspace. You don’t need to lock yourself in with one or two centralized social networks. With your own server space which can be integrated in your own device there is no dependence on central or commercial entities anymore and you could theoretically be part of hundreds of social networks.

    It is about time to start thinking how to get real projects in the wild. Until now I havent heard of many. The safebook project as far as i know currently consists of just a couple of academic publications not much more. The Opera browser has recently moved in to offer server capabilities to clients and that can sure be used for social networking.

    I actually cant think of a better idea then using smartphones (for keeping friend lists) as well as protocols such as rss,rssCloud/PubSub or XMPP for developing a true decentralized social network with most of the features that centralized social networks offer. But something has to be delivered soon. Before the facebook lock-in deprives most internet users of the choice to switch to decentralized alternatives.

  • http://bagofspoons.net Steve

    I’ve been thinking about decentralised social nets for a while, but lack the web skills to implement it. I’ve been hoping that FOAF or something similar would take off as a basis for it. There’s something to play with at

    http://www.luke.maurits.id.au/blog/entry/generating_activity_stream_feeds_using_foaflib_and_feedformatter

    This could easily be extended to find friends’ FOAF files and generate a feed from their updates. That could give me what Friendfeed does, except that few of my friends even know what FOAF is. There are a few tools to generate files, so maybe I should encourage them to try it.

  • IDWMaster

    I am developing a P2P social networking service at my website on CodePlex. It uses P2P and is entirely decentralized. You need Vista or Windows 7 to use it , although it will work on XP Service Pack 3 with a few modifications to the operating system.

  • Joey Guerra

    It seems that everything I read about the real time web is skirting around the obvious.

    The web IS the network; it’s already decentralized.

    It becomes a social network when people start adding context to the data and conversing about it.

    As such, there is already a defined protocol that can be harnessed to implement real time messaging, it’s called HTTP. And businesses have been using it to send each other real time messages for a while now (i.e web services). I’m not gonna even try to put a timeframe on it. An example of something that uses the HTTP protocol to send real time messages is WordPress’ Trackback functionality (I realize there’s no authentication). As a mater of fact, a web page is essentially a web service in it’s simplest form.

    I guess now people are just trying to define what field names to use when sending data back and forth.

    I just think people are making it more complicated than it needs to be. REST already defines a light weight standard for things like authentication, built on the HTTP protocol. And how many ways can you call someone’s “name”. First name, Last name, Middle name?

    So peer to peer REST calls would do the trick. Until of course you needed to scale the one to ALOT (thousands) scenario and at that point, it really is a different problem.