Chris Hoofnagle on Privacy and Search Engines

If any of the readers who have complained about Spock are in the Bay Area, you may want to check out Chris Hoofnagle’s talk tomorrow (Monday October 15, 10am-noon, 155 Kroeber Hall) on the subject of privacy and search engines in Marti Hearst’s search class at the Berkeley I-School. Marti sent out the following description:

Many people complain that personal information about themselves
appears in Google search results. However, Google is not the
internet. It and other search engines simply scan the internet for
documents that are responsive to individuals’ search. These
intermediaries are digging deeper into the web, finding more of these
documents, and in so doing, making it easier to access and aggregate
personal information. This discussion will focus on defining
information privacy and describing the traditional legal framework for
addressing privacy problems. We will also discuss value-conscience
design, the idea that information services can be tailored to promote
privacy; the competing incentives of search providers, which drive
companies to log and store information; and the vulnerabilities that
search creates for individuals.

Brief Bio:

Chris Jay Hoofnagle is senior staff attorney to the Samuelson Law,
Technology & Public Policy Clinic and senior fellow with the Berkeley
Center for Law & Technology. His focus is consumer privacy law.

From 2000 to 2006, he was senior counsel to the Electronic Privacy
Information Center (EPIC) and director of the organization’s West
Coast office. At EPIC, he concentrated on financial services privacy,
telemarketing regulation and consumer profiling. He was also a
non-residential fellow with Stanford University’s Center for Internet
and Society for the 2005 academic year.

Hoofnagle is a nationally recognized expert in information privacy
law. He has testified before the U.S. Congress and the California
Senate and Assembly numerous times on social security number privacy
and credit transactions. The text of his written testimony is online

I don’t know what Chris might have to say about Spock, but it seems to me that it, like Google, is just the tip of the iceberg. As we see more and more focus on online social networking and people search, you’re not going to be able to rely on security by obscurity any more. If something is on the net, it will be found. This creates new responsibilities for companies to provide recourse for individuals who feel that their privacy has been compromised or that inaccurate information is being returned, but it also creates new responsibilities for individuals, to be aware of what’s known about them online, and whether or not it’s correct. (It’s a lot like staying aware of your credit score, and I’m hopeful that search engines can do a better job than credit scoring agencies do of making sure that the user has access to this information without having to pay for it, and the ability to correct it.)

Services for managing this kind of information will definitely need to be part of the coming social network operating system (or more precisely, of the identity subsystem of the coming internet operating system.)

  • Telling people what can be inferred from what they’ve posted on the web is a service.

    Soliciting amplifications/corrections/disambiguations — That is trying to promote a new normative behavior on the net (by imposing penalties for non-compliance, mainly).

    Web 2.0 is evil.


  • Charlie Stross has an interesting recruiting agency scenario as the prologue to his new SF novel Halting State that extrapolates how this kind of data mining and inference could be put to use.

    Warning: I almost ruined a keyboard.

  • It is less about privacy and more about reputation control.

    The bitter-sweet conflict about online privacy is – it is only a problem when YOU are on the receiving end of info you do not want known.

    How often do these same people search for info about others or use the Web to promote themselves? Many use search to locate bios of other professionals or satisfy curiosities about neighborhoods.

    Professionals voluntarily put their resumes on job sites and property owners place ads online to attract prospects.

    Even more interesting is the number of people who have been reunited with lost family members and friends after searching via the major search engines or joining niche social sites.

  • Of course, the privacy implications are rather nasty for the public social network graph you talked about in a previous post as well.

    Some of the drive behind these Web 2.0 apps to quickly harness and expose what the end-users know to computers moves rather significant amounts of previously private knowledge into the public — is that always a good thing? Did anyone consider negative effects of this?

    For Search Engines, I wouldn’t initially consider disambiguating people to be harmful, but I’d be even more interested in what privacy advocates/experts think. Did anyone ask any of them to comment on Spock?

  • I’ve attended several workshops such as the Data Sharing Summit to get a better perspective on how others in Web 2.0 are thinking about this very problem.

    When we crawl the web, many times, we find personally identifiable information about people (their email, phone number, home address, and in some cases even their social security number). We made a decision from the beginning to never display this information on Spock. No matter how public it may be on the web.

    Users who sign-up to Spock have the option to subscribe to alerts whenever Spock finds new information about them on the web in public sites, social networks, or even public records. Our philosophy is that the data really belongs to the user and we want users to have the ability to remove information or correct it before we add it to the index on Spock. We also are trying to better educate users on how to manage their profiles on social networks, blogs, and other content sites.

    We get a lot of emails from people who use Spock saying how surprised they were about all the information online about them. While they can easily correct and control what information is displayed on Spock, many users are dismayed that they then have to go to Google, MSN, Yahoo, Ask, and a whole host of other search engines in order to remove that very same information. Many times, people cannot remove information from search engines until the source document itself is removed from the web.

    Tim is right. Users have to be aware and diligent in managing their identity on the web. Hopefully, a service like Spock can help people know everyplace they are on the web, and give them the tools or direction they need to control that content (for free).

    This is a very new space and we had our bumps in the road. But in the end, our goal is not only to be the best people search engine, but also the most responsible. I am confident we will get there given the time and energy we spend at Spock working on this challenge.

    Jay Bhatti