RSS never blocks you or goes down: why social networks need to be decentralized

Recurring outages on major networking sites such as Twitter and
LinkedIn, along with incidents where Twitter members were

mysteriously dropped for days at a time
,
have led many people to challenge the centralized control exerted by
companies running social networks. Whether you’re a street
demonstrator or a business analyst, you may well have come to depend
on Twitter. We may have been willing to build our virtual houses on
shaky foundations might when they were temporary beach huts; but now
we need to examine the ground on which many are proposing to build our
virtual shopping malls and even our virtual federal offices.

Instead of the constant churning among the commercial sites du
jour
(Friendster, MySpace, Facebook, Twitter), the next
generation of social networking increasingly appears to require a
decentralized, peer-to-peer infrastructure. This article looks at
available efforts in that space and suggests some principles to guide
its development.

Update: a few days ago, OpenID expert Chris Messina and
microblog developer Jyri Engeström published

an article with conclusions similar to mine
;
clearly this is a felt need that’s spreading across the Net.
Interestingly, they approach the questions from a list of what
information needs to be shared and how it needs to be transmitted; I
come from the angle of what people want from each other and how their
needs can be met. The two approaches converge, though. See the
comments for other interesting related blogs.

The peer-to-peer concept

The Internet was originally a parliament convened among peers. Every
host was a server, almost always providing file downloads and usually
email as well. To this day, ISPs “peer” when they accept data from one
ISP’s customer and delivers it to the other ISP’s customer.

To peer doesn’t mean simply to be of equal status–in fact, that
notion could be misleading, because two systems with vastly different
roles and resources can peer. More importantly, to peer means to have
no intermediary.

When the architecture requires an intermediary, it should play as
unobtrusive and minimal role as possible. For instance, Napster and
Skype have central servers, but they are used just to sign up
participants and set up connections among them.

Napster’s and Skype’s partial decentralization won them a key benefit
of peer-to-peer networking that Twitter could well take note of: they
offload most traffic from their central servers to the users and the
ISPs that connect them.

But being partially centralized means the service can still be
disrupted as a whole. Napster was shut down by a court ruling; Skype
shut itself down once through a programming error that it never
clearly explained to the public.

The Internet itself quickly developed into this hybrid model as well.
Modems and terminals created a new layer of second-class citizens,
vastly expanded by the PC revolution. These Internet users were tucked
away behind firewalls and blocked from using any services not approved
by system administrators.

By the year 2000, new companies springing up in the dot-com boom found
themselves frustrated by these restrictions, and designed their
innovative protocols to deliver data over port 80 because everybody
kept that open for Web traffic. When the practice started, traditional
Internet developers derided it as “port 80 pollution.” Now it’s called
Web Services.

As happens so often, the way forward proved to be the way
backward–that is, to restore the democracy of the early Internet–and
also predictably, was pioneered by outlier movements with dubious
legality, ethics, and financial viability. Napster made the first
impact on public consciousness, followed by services that rigorously
avoided any hint of central servers (see my 2000 article,

Gnutella and Freenet Represent True Technological Innovation
).

By the end of 2000, the term peer-to-peer had become a
household word. But the movement quickly went into retreat, facing
difficult design problems that were already under discussion in the
O’Reilly book
Peer to Peer,
published in February 2001. I summarized the problems, which remain
ongoing, in the articles

From P2P to Web Services: Addressing and Coordination
and

From P2P to Web Services: Trust
.

The issue of addressing would arise right away for a social network
developed in a pure peer-to-peer fashion. How would you check whether
your old college buddy was on the network, if you couldn’t query a
central server? And how could you choose a unique name, without a
single place to register? Names would have to be qualified by domain
names or some other identifiers–which is actually a step forward
right there. It seems to me ridiculous that a company would plan to
provide a service to the whole world using a flat namespace. And while
we’re at it, you ought to be able to change your name and bring along
all your prior activity.

Trust would also become an issue in decentralized social networks. You
could ban a correspondent from your personal list, but you couldn’t
inform a central authority about abuse. And the problem Twitter has
recently started to tackle–preventing random users from impersonating
well-known people–would be a challenge.

But decentralization brings many benefits. A failure at one person’s
site, or even on a whole segment of the network, would have no effect
on the rest of the world. A misconfigured router in Pakistan could not
keep everyone from accessing the most popular video content on the
Internet. And because each peer would have to obey common, understood
protocols, a decentralized social network would be transparent and
support the use of free software; nobody would have to puzzle over
what algorithms were in use.

Visiting many different sites instead of central server to pull
together information on friends would increase network traffic, but
modern networks have enough bandwidth to stand up to the load. Even in
places with limited bandwidth, service would degrade gracefully
because messages would be small.

The
StatusNet
project, which underlies
identi.ca,
represents a half-way step toward the kind of full decentralization
illustrated by RSS. StatusNet can power a variety of microbloggin
services, each signing up any number of members. The services can
interchange data to tie the members together.

The rest of this article looks at two possible models for a
distributed social network (RSS and XMPP), followed by an examination
of the recurring problems of peer-to-peer in the social networking
context.

Possible models

Many examples can be found of filesystems, version control systems,
and other projects that lack central servers. But I’m just going to look
at two protocols that other people are considering for decentralized
social networking.

When thinking of decentralized systems for sending short messages, RSS
and Atom have to come to mind first. They’re universal and work well
on a large scale. And Dave Winer, the inventor of RSS, has created an
enhanced version called

rssCloud
,
recently

incorporated into WordPress
.

Given the first question I asked about decentralization–how do you
find the people you’re looking for?–the RSS answer is “by
serendipity.” Like everything else on the Internet, you could come
across new treasures in many ways: surfing, searching, friends, or
media outlets. Lots of bloggers provide links from their sites to
their own faves. And RSS has developed its own ecosystem, sprouting
plenty of aggregators that offer you views into new fields of
information.

rssCloud is meant to carry more frequent traffic and more content than
the original RSS and Atom. It maintains an XML format (making it
relatively verbose for SMS, although Winer tries to separate out the
rich, enhanced data). Perhaps because of the increased traffic it
would cause, it’s less decentralized than RSS, storing updates in
Amazon S2.

XMPP was invented about the same time as RSS by a programmer named
Jeremie Miller, who wanted a standard instant messaging protocol with
tags that could support semantics, and therefore powerful new
applications. Most important, his creation, Jabber, made it possible
for individual users to run their own servers instead of depending on
America Online or Yahoo!. Jabber had the potential to complement Tim
Berners-Lee’s idea of a Semantic Web.

Because Jabber used XML, it was seen as a bit heavyweight, and the
servers were reportedly hard to configure. But the possibilities were
too promising to pass up. So the IETF formalized it, gave it a clumsy
name suitable for a standard, and released a set of RFCs about it.
Unfortunately, XMPP languished until Google adopted it for their Talk
and Wave services. These high-profile applications suggest that it has
the scalability, flexibility, and robustness for social networking.

The P2P problems, in today’s context

Even if decentralized protocols and clients were invented, there will
be a long road to democratizing social networks. The messages are
expected to be lightweight, so photos and other large batches of
content would have to be stored somewhere outside the messages. Most
users wouldn’t trust their laptops (much less their mobile devices) to
store content and serve it up 24 hours a day, so they would need a
cloud service, which might or might not be distributed.

A backup service is also necessary in order to recover from a local
disk failure or other error that wipes out several years of your
accumulated identity.

Problems such as impersonation and unsolicited communications (spam)
are hard to solve in decentralized systems because trust is always a
hierarchical quality. This is true everywhere in life, beyond the
level of a family or neighborhood. We expect our professors to be good
because they were hired by the college, and expect the college to be
good because it was accredited by a centralized facility, whose
managers were in turn appointed by elected officials. This system can
and does break down regularly, so mechanisms for repair are always
built in.

Nobody can be banned from a decentralized social network because
there’s nothing to ban them from. But there are ways to re-introduce
enough centralization to validate credentials. For instance, the
American Bar Association could register lawyers in good standing, and
you could check whether someone claiming to be a lawyer in the US was
registered. But we wouldn’t want to take this process too far and
create a web of accreditations, because that would devalue people
whose skills and viewpoints lie outside the mainstream.

You could still check whether someone shares friends with you, because
one person’s claims of friendship could be traced back to the sites he
claims to be friends with. Someone could game the system by setting up
fake sites claiming to be people you know and linking back to them,
but this is a huge amount of work and leaves the perpetrator open to
arrest for fraud. Free software developer Thomas Lord suggests that
identity could also be verified through “a fairly shallow and
decentralized hierarchy of authentication like the system of notary
publics in physical life.”

All in all, the problems of finding people and trusting people
suggests that there’s role for aggregators, just as in the case of
RSS. And these aggregators could also offer the kind of tracking
services (who talked about me today?) and statistical services (is
Michael Jackson’s death still a major topic of conversation?) that get
business people so excited about Twitter. A decentralized social
network could still be business-friendly, because traffic could be
analyzed in order to target ads more accurately–but hopefully,
because peering clients are under the users’ control, people who
didn’t want the ads could configure their systems to screen them out.

When you set up an account, you could register with aggregators of
your choice. And whenever you connected to someone, you could
automatically register his account with a list of your favorite
aggregators, in case he hadn’t registered himself. If people wanted
control over where they’re aggregated, I supposed something equivalent
to a robots.txt file could be invented. But it’s not sporting
to refuse to be counted. And there’s no point in invoking privacy
concerns–face it, if the NSA wants to read your tweets, they’ll find
a way.

So those are some of the areas where the problems of P2P and social
networking intersect. Let’s remember that current social networks are
far from solving problems of findability, trust, and persistence as
well. I don’t check how many followers I have on Twitter; I figure
most of them are spam bots. (Apologies to any of my followers who
actually are sufficiently embodied to be reading this.)

Could
OpenSocial
be used to implement a P2P social network? It’s based on a single
object that is expected to query and update a single server. But the
interface could probably be implemented to run on a single user’s
system, registering the users or aggregators with whom she
communicates and querying all those users and aggregators as
necessary.

Industry analysts have been questioning for years whether Twitter is
financially viable. Well, maybe it isn’t–maybe this particular kind
of Internet platform is not destined to be a business. Responsibility
for the platform can be distributed among millions of sites and
developers, while business opportunities can be built on top of the
platform as services in analytics, publicity, and so forth.

Like Google, Twitter and the other leading commercial Internet sites
have made tremendous contributions to the functionality of the
Internet and have earned both their popularity and (where it exists)
their revenue. But the end-to-end principle and the reliability of
distributed processing must have their day again, whenever some use of
the Internet becomes too important to leave up to any single entity.

tags: , , , , , , ,