- The Dark Market for Personal Data (NYTimes) — can buy lists of victims of sexual assault, of impulse buyers, of people with sexually transmitted disease, etc. The cost of a false-positive when those lists are used for marketing is less than the cost of false-positive when banks use the lists to decide whether you’re a credit risk. The lists fall between the cracks in privacy legislation; essentially, the compilation and use of lists of people are unregulated territory.
- 7 Principles of Rich Web Applications — “rich web applications” sounds like 2007 wants its ideas back, but the content is modern and useful. Predict behaviour for negative latency.
- Collaborative Filtering at LinkedIn (PDF) — This paper presents LinkedIn’s horizontal collaborative filtering infrastructure, known as browsemaps. Great lessons learned, including context and presentation of browsemaps or any recommendation is paramount for a truly relevant user experience. That is, design and presentation represents the largest ROI, with data engineering being a second, and algorithms last. (via Greg Linden)
From unique data applications to factories of the future, here are key insights from Strata + Hadoop World New York 2014.
Experts from across the data world came together in New York City for Strata + Hadoop World New York 2014. Below we’ve assembled notable keynotes, interviews, and insights from the event.
Unusual data applications and the correct way to say “Hadoop”
Hadoop creator and Cloudera chief architect Doug Cutting discusses surprising data applications — from dating sites to premature babies — and he reveals the proper (but in no way required) pronunciation of “Hadoop.”
Liza Kindred on the evolving role of data in fashion and the growing relationship between tech and fashion companies.
In this podcast episode, I talk with Liza Kindred, founder of Third Wave Fashion and author of the new free report “Fashioning Data: How fashion industry leaders innovate with data and what you can learn from what they know.” Kindred addresses the evolving role data and analytics are playing in the fashion industry, and the emerging connections between technology and fashion companies. “One of the things that fashion is doing better than maybe any other industry,” Kindred says, “is facilitating conversations with users.”
Gathering and analyzing user data creates opportunities for the fashion and tech industries alike. One example of this is the trend toward customization. Read more…
High-performing memory throws many traditional decisions overboard
Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance ratio radically different from both RAM and disk drives. But databases have just started to catch up during the past few years. Most still depend on internal data structures and storage management fine-tuned for spinning disks.
Citing price and performance, one author advised a wide range of database vendors to move to Flash. Certainly, a database administrator can speed up old databases just by swapping out disk drives and inserting Flash, but doing so captures just a sliver of the potential performance improvement promised by Flash. For this article, I asked several database experts — including representatives of Aerospike, Cassandra, FoundationDB, RethinkDB, and Tokutek — how Flash changes the design of storage engines for databases. The various ways these companies have responded to its promise in their database designs are instructive to readers designing applications and looking for the best storage solutions.