Reputation: where the personal and the participatory meet up (installment 2 of 4)

(Please read

installment 1
before this installment. Several of the comments on the first installment
are directly relevant to upcoming material in later installments.)

Accessibility: the problems

Information equity is certainly a major problem today. (One audience
responded to Hoffman by suggesting, “Nobody should know more about you
than you know about them.”) To digress for a moment, this is one of
the outrageous aspects of a recent court ruling that email users

have no reasonable expectation of privacy.
This apparently overrules an

opinion issued four months earlier
by the court.

In addition to the damage done to civil rights by this ruling, it is
supremely cynical because it doesn’t apply to you or me. According to
the court, you have no right to hide your email from me. But I can’t
act on that. It’s a doctrine that, conveniently, only governments (and
ISPs) can benefit from.

At the conference, by and large, everybody agreed that your data
should be available to you and that the heuristics used to generate
reputation should be open. But participants pointed out that search
engines are the only really robust reputation systems available, and
proposed that they work only because they keep their heuristics
secret.

Can we ever design a transparent system that resists fraud and gaming?
Ashish Goel, who does Operations Research at Stanford University, says
no: “It’s an intractable problem to detect collusion that inflates
reputation.” Yet he still supports transparent reputation systems.

Darko Kirovski of Microsoft Research further pointed out that a
reputation system can’t predict fraud because fraud is a sudden shift
in behavior: fraudsters behave honorably up to the moment when they
strike their victim.

Vipul Ved Prakash described
Vipul’s Razor,
a distributed spam-blocking system that has proven to be popular,
effective, and resistant to attack. It works because everybody online
can identify unsolicited bulk email, and because they mostly agree on
what’s spam and what’s not. People simply mark mail as spam when they
receive it, and when a critical mass builds up identifying a
particular email as spam, other participating systems delete it.

Prakash created a reputation community using a classic technique of
seeding it with trusted people. It’s very hard to bootstrap a stable
and trustworthy reputation online without such seeding.

New people who consistently rate email like the trusted community get
added to that community. On the other hand, anyone who rates an email
message differently from the trusted community gets downgraded
severely. Over time, a an extremely reliable set of trusted people who
act very quickly to flag spam builds up. Spammers who try to break
into the trusted group have a high barrier to entry (it requires many
accurate ratings) and are dumped quickly when they stop rating spam
correctly.

In general, panelists argued that computational systems are unlikely
to create better ratings than human beings, and human beings are
notoriously inconsistent in their ratings. But as Goel says,
computational systems can aggregate human ratings to facilitate their
perusal and application.

Changeability: the problems

It’s obvious that people, hotels, web sites, etc. change over time and
need to be re-examined. And those viewing the information also change,
so the value of information degrades over time even if a rating is
still correct. But even Hoffman’s Rapleaf doesn’t let you change a
comment after you post it. (You can, however, add new comments to
adjust your rating.)

Changing information can be hard. For example, public-key certificate
systems include revocation protocols, but they’re rarely used. Like
any distributed information, certificates are resistant to attack by
antibodies once they enter the Internet’s blood stream.

There is also a social dimension to changing information. Who says
what’s right and wrong? Just because a professor doesn’t like your
assessment on RateMyProfessors.com doesn’t mean she has a right to
remove it. Jonathan Zittrain (of Oxford University and Harvard Law
School’s Berkman Center) pointed out that the Berkman Center’s
StopBadware site
(used by Google to warn people away from sites infected by spyware) is
a reputation engine of a sort. It’s obviously one that many people
would like to eliminate–not only sites being accused of infection,
but the spammers and others who broke into those sites in the first
place.

Debates that sprang up in the 1980s (or even earlier) about
privacy–privacy versus free speech, opt-in versus opt-out–have
returned as overgrown brambles when reputation becomes an issue.

Nobody at the symposium offered a great solution to the balance
between privacy and free-speech, which have to be rejudged repeatedly
in different contexts. Rebecca Tushnet of Georgetown University Law
Center pointed out the disparity between provisions for copyright
holders and provisions for others who claim unfair behavior on the
part of online sites. The safe-harbor provision of the DMCA requires
ISPs to take down content immediately when someone claims copyright
over it (and the person who put up the content rarely succeeds in
getting it restored). But a well-known provision upheld as part of the
Communications Decency Act (USC Title 47, Section 230) exempts ISPs
from being liable for content posted by users.

So you’re much better off claiming copyright on something than trying
to get an ISP to take down a defamatory or threatening post. Tushnet
would modify both laws to move them somewhere in between these
extremes.

On the other hand, we don’t always have to assume opposing and
irreconcilable interests. Zittrain suggested that a lot of Internet
users would respect a request to refrain from propagating material. He
envisions a protocol by which someone says, “I am posting a picture of
myself in drunken abandon to amuse my friends on Facebook, but please
don’t publish it in a news article.” More generally, a lot of people
enamored of the mash-up culture grab anything amusing or intriguing to
incorporate into their work, but would be willing to leave something
alone if they could tell the originator wanted them to.

Zittrain pointed to robots files and the Creative Commons as examples
of voluntary respect for the rights of authors. He also said that the
private ownership of social networking and blogging sites–and the
consequent ability to enforce terms of service–can be used for good
or ill, and that in this case some protocols for marking content and
policies for enforcing them could be beneficial to the user privacy.

Hoffman pointed out that privacy advocates lobby for opt-in systems,
because few users care enough about privacy to opt out of data
collection. (“If consumers are responsible for protecting their
privacy, there is no privacy.”)

The latter point was underlined by a fascinating research study
presented by Alessandro Acquisti of Carnegie Mellon (Information
Technology and Public Policy). When survey takers were presented with
a detailed privacy policy assuring their confidentiality, they were
far less likely to volunteer sensitive personal information
than when they were given the survey with weak confidentiality
guarantees or no guarantees at all. In other words, people didn’t
think about the safety of providing personal information until
Acquisti’s researchers forced them to confront it.

Several panelists, including Mozelle Thompson, a former commissioner
on the Federal Trade Commission and an advisor to Facebook, confirmed
that consumers need to be protected by privacy laws, just as they need
seat-belt laws. When Thompson was on the FTC, it asked Congress to
pass comprehensive privacy legislation, but of course they
didn’t. Even the European countries, known for their strong privacy
directives and laws, “put themselves in a box” according to Thompson,
because they focused on individuals’ self-determination.

So an opt-in world is necessary to protect privacy, but Hoffman
pointed out that opt-out is required to develop most useful databases
of personal information. If search engines depended on opt-in, we
wouldn’t be able to search for much of value.

Nevertheless, our current opt-out regime is leading to such heights of
data collection–and eventual abuse–that Hoffman believes a reaction
is imminent. Either government regulation or a strong consumer
movement will challenge opt-out, and we need to offer a
well-though-out combination of regulation and corporate good behavior
in order to avoid a flip to a poorer opt-in world.