Is intimate personal information a toxic asset in client-cloud datacenters?

Guest blogger Carl Hewitt, Emeritus at MIT in the Electrical Engineering and Computer Science department, is known for his research on Direct Logic^TM, privacy-friendly client cloud computing, norms and commitments for organizational computing, and concurrent programming languages, models, and theories.

Client-cloud aggregators (Google, Yahoo, Microsoft, Facebook, etc.) tend to believe that personal information is a valuable asset for several reasons. It is valuable to advertisers because it enables greater relevance for their ads. It is valuable to users because it can be used to enrich their lives. And it is valuable to aggregators because they can use personal information to make more money by selling (anonymous?) versions and by using it to bring together advertisers and customers. Recency and intimacy can add value to information. Current and recent information tends to be more relevant than older information. Intimate psychological, physiological, sociological, geographical, medical, etc. information can be used to personalize interactions.

Intimate current personal information is also valuable for government security because it can be critical to taking security counter measures. Already in the UK, the previous two years of everyone’s email, web browsing, and telephone calls are becoming available to government officials at varying levels of detail. For example, detectives will be required to consider accessing telephone and internet records during every investigation under new plans to increase police use of communications data.

But that’s only the beginning. As Jim Gray noted in “Distributed Computing Economics” (MSR-TR-2003-24) there is a growing imbalance between the computation power of billions of cores in aggregator datacenters and the relatively feeble fiber optic communications coming out of aggregator datacenters. This problem has now become so severe that Amazon has been forced to introduce a commercial service that lets users of their cloud import and export data through the post–as in, put it on storage devices and ship it by land, sea, or air. Soon even this stopgap will become impractical for government security agencies because whole shipping containers would have to be transferred–the functional equivalent of shipping large pieces of an aggregator datacenter. Consequently, to be effective, future government security software will have to be tightly integrated with aggregator datacenters. The most effective security measures will require aggregator datacenters to be heavily regulated, i.e., analogous to nuclear power plants.

Semantic Integration, an emerging technological capability to bring together all kinds of information in a semantic engine, will greatly intensify all of the above issues (see “A historical perspective on developing foundations for privacy-friendly client cloud computing: The Paradigm Shift from ‘Inconsistency Denial’ to ‘Practical Semantic Integration^TM‘ “ ArXiv 0901.4934). The following kinds of information can be semantically integrated: calendars and to-do lists, email, SMS and Twitter archives, presence information (including physical, psychological and social), maps (including firms, points of interest, traffic, parking, and weather), events (including alerts and status), documents (including presentations, spreadsheets, proposals, job applications, health records, photos, videos, gift lists, memos, purchasing, contracts, articles), contacts (including social graphs and reputation) and search results (including rankings and ratings).

Two critical technologies are the foundation of Practical Semantic Integration: The first is Lightly Structured Natural Language^TM interfaces that allow information to be easily found and organized. The second is many-core semantic engines (see “ActorScript^TM: Industrial strength integration of local and nonlocal concurrency for Client-cloud Computing”“; ArXiv 0907.3330) that rapidly process information in ways that are tolerant of inconsistency (see “Common sense for concurrency and inconsistency tolerance using Direct Logic^TM“ ArXiv 0812.4852).

To be effective, government security Semantic Integration systems will need to be joined with those of aggregators. Thus Semantic Integration of personal information on aggregator datacenters will require additional government regulation of aggregators. Will government regulation prove toxic to the ability of aggregators to innovate?

This is a future that we expect most readers would find distasteful. There is an alternative: A client cloud is a local cloud controlled by a client, e.g., a family cloud might consist of the cell phones, computers, security cameras, home entertainment centers, Wi-Fi access points, etc. of a family. Semantic Integration could be performed in clients’ clouds so that clients by default store their information in cloud datacenters in a way that it can be decrypted only by using a client’;s secret key.

Semantic Integration using clients’ clouds has some important advantages. Client responsiveness can be faster by not requiring communication with datacenters. Aggregator capital, operating and communication costs can be lower because Semantic Integration is performed in clients’ clouds instead of aggregator datacenters.

By performing Semantic Integration in clients’ clouds, aggregators can make tons of more money than now by doing an even better job of matching up customers with merchants in a way that is more pleasing to both. Aggregators can provide software that runs in the clients’ clouds (although it may have to be audited by 3rd parties). The aggregator’s software can volunteer high level information to the aggregator’s datacenters about the kind of merchant information that might be relevant. Within clients’ clouds, the merchant information can then be tailored to the specific requirements of clients.

For reasons above, an aggregator can do better by performing clients’ Semantic Integration using their clouds rather than relying entirely on the aggregator’s cloud. And using clients’ clouds could lessen the degree of government regulation because the government would have to subpoena clients to obtain their most intimate personal information. If the information in an aggregator’s datacenters is sufficiently anonymous, then it would not become necessary for government security agencies to regulate them so heavily.

The question is: “What are the aggregators going to do about intimate personal information?” If one of them initiates a project to develop a Semantic Integration product that operates in clients’ clouds, then the others will rapidly follow suit.

Is intimate personal information a toxic asset in client-cloud datacenters?

Get the O’Reilly Systems Engineering and Operations Newsletter