Strata Week: Overcharging algorithms

Here are a few of the data stories that caught my eye this week.

When algorithms overcharge on Amazon

A postdoc in Michael Eisen’s lab at UC Berkeley logged in to Amazon a couple of weeks ago in order to purchase a copy of Peter Lawrence’s “The Making of a Fly.” Although out of print, the book is a classic in the field of evolutionary biology, and there were several copies available, both new and used. The used copies were on sale for roughly $35. The two new copies were priced a bit higher: $1.7 and $2.1 million. Although he assumed at first it was a mistake, when Eisen returned to the page the next day, he found the price had gone up, with both books for sale around $2.8 million. By the end of the day, the price of one was raised again, to more than $3.5 million.

Some folks got creative in response to the multi-million-dollar price tag attached to “The Making of a Fly.”

Eisen worked out that once a day, one of the sellers was setting his price to be .9983 times the price of the copy offered by the other. The price of that seller’s book was increasing at 1.270589 times the other’s. Both were using algorithmic pricing, a common practice with vendors on Amazon and with Amazon itself, in order to automatically change the prices based on a competitor’s.

It’s obvious why one vendor would establish an algorithm to perpetually undercut the competition. Less clear, why the other would choose to always price higher. It’s possible that the vendor was hoping that high ratings would compel customers to pay the higher price. But Eisen thinks it’s more likely that the vendor didn’t actually own a copy of the book, and set the algorithm to aim for a higher price so as to cover acquisition costs.

Eisen wrote:

What’s fascinating about all this is both the seemingly endless possibilities for both chaos and mischief. It seems impossible that we stumbled onto the only example of this kind of upward pricing spiral — all it took were two sellers adjusting their prices in response to each other by factors whose products were greater than 1. And while it might have been more difficult to deconstruct, one can easily see how even more bizarre things could happen when more than two sellers are in the game. And as soon as it was clear what was going on here, I and the people I talked to about this couldn’t help but start thinking about ways to exploit our ability to predict how others would price their books down to the 5th significant digit — especially when they were clearly not paying careful attention to what their algorithms were doing.

Eventually someone noticed, and the price dropped to around $150.

White hot Hadoop

Yahoo is considering spinning off its Hadoop engineering unit into a new company, according to a story this week in The Wall Street Journal. Yahoo didin’t comment for that story, but the piece cites Benchmark Capital partner Rob Bearden as saying that the venture capital firm has spoken to Yahoo about how it might form a separate Hadoop-oriented company.

The article posits that the Hadoop market is a multi-billion dollar one and that the opportunity is huge for Yahoo, something that GigaOm’s Derrick Harris examines with a more nuanced eye to the market. “For Hadoop users and startups building tools atop Hadoop, though,” Harris concludes, “more competition among distributions is only good news.”

U.S. Supreme Court weighs legality of data mining

The U.S. Supreme Court heard oral arguments this week in “Sorrel v IMS Health,” a case that will determine the constitutionality of a Vermont law restricting the commercial distribution of a physician’s prescription records. The outcome could set important precedents in privacy and data issues.

In 2007, Vermont’s legislature passed the Prescription Confidentiality Law, giving doctors the ability to deny pharmacies the option of selling their prescription information to data-mining companies. IMS Health, along with two other data-collection companies and PhRMA, a pharmaceutical industry association, challenged the constitutionality of the law, arguing it would make it more difficult for drugmakers to identify doctors for potential sales.

SCOTUSblog’s Lyle Denniston reports that the justices grilled the Vermont Attorney General about the law, questioning whether it was written too narrowly — targeting only the pharmacies and not insurance companies, for example — or whether it served to protect doctors’ privacy.

Denniston wrote:

… it became very clear that the Justices — perhaps more than a simple majority — see this first test case as one about corporate free speech. That might not turn out to be true in every case of data-mining that comes along, but it would certainly seem so when a legislature blatantly sets out to curb the use of that technology to convey a commercial message, made up of truthful information.

The Supreme Court is expected to announce its decision this summer.

Got data news?

Feel free to email me.

Related:

Strata Week: Overcharging algorithms

Algorithms go awry on Amazon, the future of Hadoop at Yahoo, and the Supreme Court mulls data mining

When algorithms overcharge on Amazon

White hot Hadoop

U.S. Supreme Court weighs legality of data mining

Got data news?

Get the O’Reilly Data Newsletter