"privacy" entries

Four short links: 12 May 2015

Four short links: 12 May 2015

Data Center Numbers, Utility Computing, NSA Art, and RIP CAP

  1. We Used to Build Steel Mills Near Cheap Sources of Power, but Now That’s Where We Build DatacentersHennessy & Patterson estimate that of the $90M cost of an example datacenter (just the facilities – not the servers), 82% is associated with power and cooling. The servers in the datacenter are estimated to only cost $70M. It’s not fair to compare those numbers directly since servers need to get replaced more often than datacenters; once you take into account the cost over the entire lifetime of the datacenter, the amortized cost of power and cooling comes out to be 33% of the total cost, when servers have a three-year lifetime and infrastructure has a 10-15 year lifetime. Going back to the Barroso and Holzle book, processors are responsible for about a third of the compute-related power draw in a datacenter (including networking), which means that just powering processors and their associated cooling and power distribution is about 11% of the total cost of operating a datacenter. By comparison, the cost of all networking equipment is 8%, and the cost of the employees that run the datacenter is 2%.
  2. Microsoft Invests in 3 Undersea Cable Projects — utility computing is an odd concept, given how quickly hardware cycles refresh. In the past, you could ask whether investors wanted to be in a high-growth, high-risk technology business or a stable blue-chip utility.
  3. Secret Power — Simon Denny’s NSA-logo-and-Snowden-inspired art makes me wish I could get to Venice. See also The Guardian piece on him.
  4. Please Stop Calling Databases CP or AP (Martin Kleppman) — The fact that we haven’t been able to classify even one datastore as unambiguously “AP” or “CP” should be telling us something: those are simply not the right labels to describe systems. I believe that we should stop putting datastores into the “AP” or “CP” buckets. So readable!

Signals from Strata + Hadoop World 2015 in London

Key insights from Strata + Hadoop World 2015 in London.

People from across the data world came together this week for Strata + Hadoop World 2015 in London. Below we’ve assembled notable keynotes, interviews, and insights from the event.

Shazam already knows the next big hit

“With relative accuracy, we can predict 33 days out what song will go to No. 1 on the Billboard charts in the U.S.,” says Cait O’Riordan, VP of product for music and platforms at Shazam. O’Riordan walks through the data points and trendlines — including the “shape of a pop song” — that give Shazam hints about hits.

Read more…

Comment: 1
Four short links: 15 April 2015

Four short links: 15 April 2015

Facebook as Biometrics, Time Series Sequences, Programming Languages, and Oceanic Robots

  1. Facebook Biometrics Cache (Business Insider) — Facebook has been accused of violating the privacy of its users by collecting their facial data, according to a class-action lawsuit filed last week. This data-collection program led to its well-known automatic face-tagging service. But it also helped Facebook create “the largest privately held stash of biometric face-recognition data in the world,” the Courthouse News Service reports.
  2. The Clustering of Time Series Sequences is Meaningless (PDF) — Clustering of time series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any data set, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. From 2003, warning against sliding window techniques.
  3. Toolkits for the Mind (MIT TR) — Programming–language designer Guido van Rossum, who spent seven years at Google and now works at Dropbox, says that once a software company gets to be a certain size, the only way to stave off chaos is to use a language that requires more from the programmer up front. “It feels like it’s slowing you down because you have to say everything three times,” van Rossum says. Amen!
  4. Robots Roam Earth’s Imperiled Oceans (Wired) — It’s six feet long and shaped like an airliner, with two wings and a tail fin, and bears the message, “OCEANOGRAPHIC INSTRUMENT PLEASE DO NOT DISTURB.” All caps considered, though, it’s a more innocuous epigram than the one on a drone I saw back at the dock: “Not a weapon — Science Instrument.”
Four short links: 14 April 2015

Four short links: 14 April 2015

Technical Debt, A/A Testing, NSA's Latest, and John von Neumann

  1. Pycon 2015: Technical Debt, The Monster in Your Closet (YouTube) — excellent talk from PyCon. See also slides.
  2. A/A TestingIn an A/A test, you run a test using the exact same options for both “variants” in your test. That’s right, there’s no difference between “A” and “B” in an A/A test. It sounds stupid, until you see the “results.” (via Nelson Minar)
  3. NSA Declares War on General-Purpose Computing (BoingBoing) — NSA director Michael S Rogers says his agency wants “front doors” to all cryptography used in the USA, so that no one can have secrets it can’t spy on — but what he really means is that he wants to be in charge of which software can run on any general purpose computer.
  4. John von Neumann Documentary (YouTube) — 1966 documentary from the American Mathematical Association on the father of digital computing, who also is hailed as the father of game theory and much much more. (via Paul Walker)
Four short links: 25 March 2015

Four short links: 25 March 2015

Selling Customers, Classier Parsing, License Plates, and GitHub's CSS

  1. RadioShack’s Customer Data For Sale (Ars Technica) — trying to sell customer data as part of court-supervised bankruptcy.
  2. Classp: A Classier Way to Parse (Google Code) — The abstract syntax tree is what programmers typically want to work with. With class patterns, you only have two jobs: design the abstract syntax tree and write a formatter for it. (A formatter is the function that writes out the abstract syntax tree in the target language.)
  3. 4.6M License Plate Records From FOIA Request (Ars Technica) — from Oakland.
  4. Primerthe CSS toolkit and guidelines that power GitHub.
Comment: 1
Four short links: 4 March 2015

Four short links: 4 March 2015

Go Microservices, Watch Experience, Multithreading Bugs, and Spooks Ahoy

  1. Microservices in Go — tale of rewriting a Ruby monolith as Go microservices. Interesting, though being delivered at Gophercon India suggests the ending is probably not unhappy.
  2. Watch & Wear (John Cross Neumann) — Android watch as predictor of the value and experience of an Apple Watch. I believe this is the true sweet spot for meaningful wearable experiences. Information that matters to you in the moment, but requires no intervention. Wear actually does this extremely well through Google Now. Traffic, Time to Home, Reminders, Friend’s Birthdays, and Travel Information all work beautifully. […] After some real experience with Wear, I think what is more important is to consider what Apple Watch is missing: Google Services. Google Services are a big component of what can make wearing a tiny screen on your wrist meaningful and personal. I wouldn’t be surprised after the initial wave of apps through the app store if Google Now ends up being the killer app for Apple Watch.
  3. Solving 11 Likely Problems In Your Multithreaded Code (Joe Duffy) — a good breakdown of concurrency problems, including lower-level ones than high-level languages expose. But beware. If you try this [accessing variables with synchronisation] on a misaligned memory location, or a location that isn’t naturally sized, you can encounter a read or write tearing. Tearing occurs because reading or writing such locations actually involves multiple physical memory operations. Concurrent updates can happen in between these, potentially causing the resultant value to be some blend of the before and after values.
  4. Obama Sharply Criticizes China’s Plans for New Technology Rules (Reuters) — In an interview with Reuters, Obama said he was concerned about Beijing’s plans for a far-reaching counterterrorism law that would require technology firms to hand over encryption keys, the passcodes that help protect data, and install security “backdoors” in their systems to give Chinese authorities surveillance access. Goose sauce is NOT gander sauce! NOT! Mmm, delicious spook sauce.
Four short links: 27 February 2015

Four short links: 27 February 2015

No Estimates, Brand Advertising, Artificial Intelligence, and GPG BeGone

  1. #NoEstimatesAllspaw also points out that the yearning to break the bonds of estimation is nothing new — he’s fond of quoting a passage from The Unwritten Laws of Engineering, a 1944 manual which says that engineers “habitually try to dodge the irksome responsibility for making commitments.” All of Allspaw’s segment is genius.
  2. Old Fashioned Snapchatget a few drinks in any brand advertiser and they’ll admit that the number one reason they know that brand advertising works is that, if they stop, sales inevitably drop.
  3. Q&A With Bruce Sterling on Artificial Intelligence — in which Sterling sounds intelligent, and the questioner sounds Artificial.
  4. GPG and Me (Moxie Marlinspike) — Even though GPG has been around for almost 20 years, there are only ~50,000 keys in the “strong set,” and less than 4 million keys have ever been published to the SKS keyserver pool ever. By today’s standards, that’s a shockingly small user base for a month of activity, much less 20 years. This was a great talk at Webstock this year.
Four short links: 25 February 2015

Four short links: 25 February 2015

Bricking Cars, Mapping Epigenome, Machine Learning from Encrypted Data, and Phone Privacy

  1. Remotely Bricking Cars (BoingBoing) — story from 2010 where an intruder illegally accessed Texas Auto Center’s Web-based remote vehicle immobilization system and one by one began turning off their customers’ cars throughout the city.
  2. Beginning to Map the Human Epigenome (MIT) — Kellis and his colleagues report 111 reference human epigenomes and study their regulatory circuitry, in a bid to understand their role in human traits and diseases. (The paper itself.)
  3. Machine Learning Classification over Encrypted Data (PDF) — It is worth mentioning that our work on privacy-preserving classification is complementary to work on differential privacy in the machine learning community. Our work aims to hide each user’s input data to the classification phase, whereas differential privacy seeks to construct classifiers/models from sensitive user training data that leak a bounded amount of information about each individual in the training data set. See also The Morning Paper’s unpacking of it.
  4. Privacy of Phone Audio (Reddit) — unconfirmed report from Redditor I started a new job today with Walk N’Talk Technologies. I get to listen to sound bites and rate how the text matches up with what is said in an audio clip and give feed back on what should be improved. At first, I though these sound bites were completely random. Then I began to notice a pattern. Soon, I realized that I was hearing peoples commands given to their mobile devices. Guys, I’m telling you, if you’ve said it to your phone, it’s been recorded…and there’s a damn good chance a 3rd party is going to hear it.

An Internet of Things that do what they’re told

Our things are getting wired together, and you're not secure if you can't control the destiny of your private information.


The digital world has been colonized by a dangerous idea: that we can and should solve problems by preventing computer owners from deciding how their computers should behave. I’m not talking about a computer that’s designed to say, “Are you sure?” when you do something unexpected — not even one that asks, “Are you really, really sure?” when you click “OK.” I’m talking about a computer designed to say, “I CAN’T LET YOU DO THAT DAVE” when you tell it to give you root, to let you modify the OS or the filesystem.

Case in point: the cell-phone “kill switch” laws in California and Minneapolis, which require manufacturers to design phones so that carriers or manufacturers can push an over-the-air update that bricks the phone without any user intervention, designed to deter cell-phone thieves. Early data suggests that the law is effective in preventing this kind of crime, but at a high and largely needless (and ill-considered) price.

To understand this price, we need to talk about what “security” is, from the perspective of a mobile device user: it’s a whole basket of risks, including the physical threat of violence from muggers; the financial cost of replacing a lost device; the opportunity cost of setting up a new device; and the threats to your privacy, finances, employment, and physical safety from having your data compromised. Read more…

Comments: 3

The Intimacy of Things

At what layer do we build privacy into the fabric of devices?

Sign-up to attend Solid 2015 to explore the convergence of privacy, security, and the Internet of Things.


In 2011, Kashmir Hill, Gizmodo and others alerted us to a privacy gaffe made by Fitbit, a company that makes small devices to help people keep track of their fitness activities. It turns out that Fitbit broadcast the sexual activity of quite a few of their users. Realizing this might not sit well with those users, Fitbit took swift action to remove the search hits, the data, and the identities of those affected. Fitbit, like many other companies, believed that all the data they gathered should be public by default. Oops.

Does anyone think this is the last time such a thing will happen?

Fitness data qualifies as “personal,” but sexual data is clearly in the realm of the “intimate.” It might seem like semantics, but the difference is likely to be felt by people in varying degrees. The theory of contextual integrity says that we feel violations of our privacy when informational contexts are unexpectedly or undesirably crossed. Publicizing my latest workout: good. Publicizing when I’m in flagrante delicto: bad. This episode neatly exemplifies how devices are entering spaces where they’ve not tread before, physically and informationally. Read more…