Is intimate personal information a toxic asset in client-cloud datacenters?

Guest blogger Carl Hewitt, Emeritus at MIT in the Electrical Engineering and Computer Science department, is known for his research on Direct LogicTM, privacy-friendly client cloud computing, norms and commitments for organizational computing, and concurrent programming languages, models, and theories.

Client-cloud aggregators (Google, Yahoo, Microsoft, Facebook, etc.) tend to believe that personal information is a valuable asset for several reasons. It is valuable to advertisers because it enables greater relevance for their ads. It is valuable to users because it can be used to enrich their lives. And it is valuable to aggregators because they can use personal information to make more money by selling (anonymous?) versions and by using it to bring together advertisers and customers. Recency and intimacy can add value to information. Current and recent information tends to be more relevant than older information. Intimate psychological, physiological, sociological, geographical, medical, etc. information can be used to personalize interactions.

Intimate current personal information is also valuable for government security because it can be critical to taking security counter measures. Already in the UK, the previous two years of everyone’s email, web browsing, and telephone calls are becoming available to government officials at varying levels of detail. For example, detectives will be required to consider accessing telephone and internet records during every investigation under new plans to increase police use of communications data.

But that’s only the beginning. As Jim Gray noted in “Distributed Computing Economics” (MSR-TR-2003-24) there is a growing imbalance between the computation power of billions of cores in aggregator datacenters and the relatively feeble fiber optic communications coming out of aggregator datacenters. This problem has now become so severe that Amazon has been forced to introduce a commercial service that lets users of their cloud import and export data through the post–as in, put it on storage devices and ship it by land, sea, or air. Soon even this stopgap will become impractical for government security agencies because whole shipping containers would have to be transferred–the functional equivalent of shipping large pieces of an aggregator datacenter. Consequently, to be effective, future government security software will have to be tightly integrated with aggregator datacenters. The most effective security measures will require aggregator datacenters to be heavily regulated, i.e., analogous to nuclear power plants.

Semantic Integration, an emerging technological capability to bring together all kinds of information in a semantic engine, will greatly intensify all of the above issues (see “A historical perspective on developing foundations for privacy-friendly client cloud computing: The Paradigm Shift from ‘Inconsistency Denial’ to ‘Practical Semantic IntegrationTM‘ “ ArXiv 0901.4934). The following kinds of information can be semantically integrated: calendars and to-do lists, email, SMS and Twitter archives, presence information (including physical, psychological and social), maps (including firms, points of interest, traffic, parking, and weather), events (including alerts and status), documents (including presentations, spreadsheets, proposals, job applications, health records, photos, videos, gift lists, memos, purchasing, contracts, articles), contacts (including social graphs and reputation) and search results (including rankings and ratings).

Two critical technologies are the foundation of Practical Semantic Integration: The first is Lightly Structured Natural LanguageTM interfaces that allow information to be easily found and organized. The second is many-core semantic engines (see “ActorScriptTM: Industrial strength integration of local and nonlocal concurrency for Client-cloud Computing”“; ArXiv 0907.3330) that rapidly process information in ways that are tolerant of inconsistency (see “Common sense for concurrency and inconsistency tolerance using Direct LogicTM ArXiv 0812.4852).

To be effective, government security Semantic Integration systems will need to be joined with those of aggregators. Thus Semantic Integration of personal information on aggregator datacenters will require additional government regulation of aggregators. Will government regulation prove toxic to the ability of aggregators to innovate?

This is a future that we expect most readers would find distasteful. There is an alternative: A client cloud is a local cloud controlled by a client, e.g., a family cloud might consist of the cell phones, computers, security cameras, home entertainment centers, Wi-Fi access points, etc. of a family. Semantic Integration could be performed in clients’ clouds so that clients by default store their information in cloud datacenters in a way that it can be decrypted only by using a client’;s secret key.

Semantic Integration using clients’ clouds has some important advantages. Client responsiveness can be faster by not requiring communication with datacenters. Aggregator capital, operating and communication costs can be lower because Semantic Integration is performed in clients’ clouds instead of aggregator datacenters.

By performing Semantic Integration in clients’ clouds, aggregators can make tons of more money than now by doing an even better job of matching up customers with merchants in a way that is more pleasing to both. Aggregators can provide software that runs in the clients’ clouds (although it may have to be audited by 3rd parties). The aggregator’s software can volunteer high level information to the aggregator’s datacenters about the kind of merchant information that might be relevant. Within clients’ clouds, the merchant information can then be tailored to the specific requirements of clients.

For reasons above, an aggregator can do better by performing clients’ Semantic Integration using their clouds rather than relying entirely on the aggregator’s cloud. And using clients’ clouds could lessen the degree of government regulation because the government would have to subpoena clients to obtain their most intimate personal information. If the information in an aggregator’s datacenters is sufficiently anonymous, then it would not become necessary for government security agencies to regulate them so heavily.

The question is: “What are the aggregators going to do about intimate personal information?” If one of them initiates a project to develop a Semantic Integration product that operates in clients’ clouds, then the others will rapidly follow suit.

tags: ,

Get the O’Reilly Web Ops and Performance Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

  • The actionability, accuracy, and governmental relevance of highly intimate personal data is intrinsically vulnerable to low-cost, low-effort, legal disinformation injections into the datastream. I take no credit for recognizing the utterly inevitable nature of this asymmetric response to increasing aggregation of personally sensitive data; Vernor Vinge, in “Rainbow’s End,” sets it forth as an intrinsic – and intrinsically unavoidable – part of online networked life via his Friends of Privacy concept.

    Rather than trying desperately – and futilely – to “control” intimate personal details, a smart online participant will realize that “salting” one’s online data with even a small portion of intentional disinformation makes the entire aggregate of data all but useless for use in highly-tuned targeting efforts. It’s far easier to put one drop of purple dye in a swimming pool than it is to take that one drop back out: the Friends of Privacy exists, as a nonprofit group, in Vinge’s novel to add those drops to the online data pool and thereby obviate most concerns about hyper-surveillance.

    Where I diverge from Vinge is in seeing this as a nonprofit activity; rather, I see this as a valuable service to be commercially available for those who pay a small fee. A stalker is combing online to track down his target of obsession: does one try to “erase” one’s online life, or hire Friends of Privacy to salt it with disinformation and leave the stalker running in circles chasing endless dead-ends? Obviously, the latter option is more cost-effective, practical, effective, reliable, and legally viable. In that sense, the service is inevitable – some folks will be happy to salt their own data, but a commercial service will be able to do it well, consistently, quickly, and on a maintenance basis.

    Don’t hide through invisibility; become invisible by being ubiquitous. Personally, my home address is easily available online… mixed in with quite a few “home addresses” that aren’t really me. Which is which? The cloud doesn’t know, and if it figures it out I just salt info into the cloud claiming the others are “really” the correct one, etc.

    Fausty |

  • Jeremy

    Right on, isn’t passing all your info to the cloud a bad idea? Great to see these ideas.

    This is separating control and processing. The benefit is that sensitive data never leaves the processing site – good for security and good for bandwidth bottlenecks – and also that complex control is passed off to someone else.

    All the benefits accrue to the customer that way: handing off management of the systems, but retaining all the benefits of in-house systems. It amounts to putting a few mirrors on the traditional setup and rearranging the components in terms of location and ownership to favor the user, and benefit network capacity.

  • Bill

    Google is putting itself at great risk of future heavy government regulation by security agencies because of the way it stores peoples’ email, contacts, calendars, and documents in Google data centers.

  • Carl,

    In short, you want to replace personal desktops in the homes with personal clouds that are backed up in the public cloud in the vendor’s datacenter. If my understanding is correct, doesn’t this put some of the responsibility on the users much like what desktop is doing now (security, malware, backup, etc). Isn’t one of the biggest motivating factors for consumers to move to cloud is the abstraction of these very same tasks by shifting the responsibility to vendors. Nevertheless, I like this idea as it fits in my love for an open federated model of the cloud.

  • bowerbird

    the government will find a way to get the information it wants.


  • Chris

    @Fausty Is the solution really to poison your public information stream? You’re perfectly correct that it is easy to salt the info stream with incorrect data, but how does that help those who don’t know you? Why would you put personal information into a public space if you didn’t want strangers to know it? Surely the point is to work out how best to make use of the opportunities these new technologies offer us, not to frustrate them and make them useless?

    And if you’re worried about private information becoming public, why not tackle that problem? If you can’t trust the entity holding the information, don’t entrust it with your data. And if you’re worried about an organisation changing its trustworthiness, then you can’t trust it, can you?

  • Thank you all for your excellent comments.

    Our industry faces a threat of government regulation like it’s never seen before. Governments have legitimate security concerns. The challenge is to meet these concerns while protecting our civil liberties and our ability to innovate.

  • >> Our industry faces a threat of government regulation like it’s never seen before. Governments have legitimate security concerns. The challenge is to meet these concerns while protecting our civil liberties and our ability to innovate.

    Be careful wishing for a Rand-ian social Darwinism. The mess the world’s economy is in was caused, explicitly, by techies “innovating” by bending, evading, and breaking the markets; particularly the financial markets. In case you hadn’t noticed.

    Laissez faire economics always ends of badly for 99.44% of the population. Only those who delude themselves into believing they belong to the .56% buy that.

  • What to see one such software for real.

    Check the web site This software stores your passwords for websites and other information such as Banking details (!). But it is encrypted locally before uploading.

    I am not way connected to the above website. Just a user who likes the concept.

  • So, I’d really like to see web apps pushing more data into the cloud. It would be so much better if I could take my data with me and allow web apps access as and when they need so as not to change the experience. With the spread of broadband etc I don’t see why this wouldn’t be possible.

    You’re mention of ‘client clouds’ spurred me on to write about it, they’re an interesting concept as they allow for data portability while still having an ‘always on’ factor. The privacy and regulation issues you mention are the stumbling blocks. We need a system of identification and API keys to allow any of this to happen.

    Ps. Your trackbacks don’t work; my post here:

  • So, I’d really like to see web apps pushing more data into the cloud. It would be so much better if I could take my data with me and allow web apps access as and when they need so as not to change the experience. With the spread of broadband etc I don’t see why this wouldn’t be possible.

  • Gerry

    Ed – I think the concept of “client cloud” in an inevitable answer to the problems Carl outlined in his problem statement above. The growth of the aggregators is likely to be unstoppable, the pervasiveness of government regulation/intervention is equally inevitable, unless there is some way to provide a new option for citizen/consumer privacy protection in the context of “the Cloud”. I’m not at all sure the aggregators will be the entities that successfully offer new solutions to the general public for “client cloud security”. I’m not certain who will come up with an answer that is effective and easy to use.

  • Peter

    To Krish and Gerry’s comments — I’m pretty sure what the author is thinking is letting the commercial cloud providers (Google, Amazon, etc.) cloud do the “heavy lifting” while keeping the security keys in that sort of local clloud. A sort of win-win for both users and providers.

    If I’m not mistaken, Hewitt probably has more up his sleeve in respect to cloud computing? Twitter, for example, was apparently built based on principles traceable to Hewitt’s “Actor” programming work. Anyhow it would be interesting for Carl Hewitt to add a few details. How are we going to mine this data? And, how are we going to keep personal control?

  • Mike Gale

    My observation of machine inferred information to date is that it’s mostly plain wrong. I see no need to poison the information pool. It is currently self poisoning. I’ve worked with computer system that infer things (in industrial systems) from partial and missing information. My experience is that the input data has got to be good or you’re wasting your time. What gets into the cloud and the assumptions on which is is processed and stored are currently a foundation of sand.

    I have no problem with that. I really don’t want some anonymous vendor contacting me, typically with an attitude that one success in 100 is really good. No thank you. Annoy 99 to get one sale is bad.

    Instead I would really like another application of these ideas. I want information on vendors. Solid reliable information. Where appropriate judged by people who share my views and attitudes. Then I’ll evaluate these guys, when I want to, in my own way.

    Now that would be more like a precision technology than World War II style area bombing!!

  • Why would you put personal information into a public space if you didn’t want strangers to know it?