- Commotion — open source mesh networks.
- WriteLaTeX — online collaborative LaTeX editor. No, really. This exists. In 2014.
- Distributed Systems — free book for download, goal is to bring together the ideas behind many of the more recent distributed systems – systems such as Amazon’s Dynamo, Google’s BigTable and MapReduce, Apache’s Hadoop etc.
- How Netflix Reverse-Engineered Hollywood (The Atlantic) — Using large teams of people specially trained to watch movies, Netflix deconstructed Hollywood. They paid people to watch films and tag them with all kinds of metadata. This process is so sophisticated and precise that taggers receive a 36-page training document that teaches them how to rate movies on their sexually suggestive content, goriness, romance levels, and even narrative elements like plot conclusiveness.
The O’Reilly Data Show podcast: Eric Colson on algorithms, human computation, and building data science teams.
Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.
In this episode of the O’Reilly Data Show, I spoke with Eric Colson, chief algorithms officer at Stitch Fix, and former VP of data science and engineering at Netflix. We talked about building and deploying mission-critical, human-in-the-loop systems for consumer Internet companies. Knowing that many companies are grappling with incorporating data science, I also asked Colson to share his experiences building, managing, and nurturing, large data science teams at both Netflix and Stitch Fix.
Augmented systems: “Active learning,” “human-in-the-loop,” and “human computation”
We use the term ‘human computation’ at Stitch Fix. We have a team dedicated to human computation. It’s a little bit coarse to say it that way because we do have more than 2,000 stylists, and these are very much human beings that are very passionate about fashion styling. What we can do is, we can abstract their talent into—you can think of it like an API; there’s certain tasks that only a human can do or we’re going to fail if we try this with machines, so we almost have programmatic access to human talent. We are allowed to route certain tasks to them, things that we could never get done with machines. … We have some of our own proprietary software that blends together two resources: machine learning and expert human judgment. The way I talk about it is, we have an algorithm that’s distributed across the resources. It’s a single algorithm, but it does some of the work through machine resources, and other parts of the work get done through humans.
Hortonworks' Data Platform for Windows, Intel's Hadoop distribution, invasive smartphone surveillance, and data-driven "House of Cards."
Windows gets Hadoop, Intel launches Hadoop distribution
Hortonworks released a beta version of its Hortonworks Data Platform for Windows this week. In the press release, the company highlights the mission is to “expand the reach of Apache Hadoop across the enterprise” and notes that the “100% open source Hortonworks Data Platform is the industry’s first and only Apache Hadoop distribution for both Windows and Linux.”
Barb Darrow notes at GigaOm that there’s likely no better way to bring big data to the masses than via Microsoft Excel. Darrow reports that Hortonworks’ VP of corporate strategy Shawn Connolly told her that “[t]he combination should make it easier to integrate data from SQL Server and Hadoop and to funnel all that into Excel for charting and pivoting and all the tasks Excel is good at,” stressing that the same Apache Hadoop distribution will run on both Windows and Linux. Connolly also noted to Darrow that “an analogous Hortonworks Data Platform for Windows Azure is still in the works.”
Malware Industrial Complex, Indies Needed, TV Analytics, and HTTP Benchmarking
- Welcome to the Malware-Industrial Complex (MIT) — brilliant phrase, sound analysis.
- Stupid Stupid xBox — The hardcore/soft-tv transition and any lead they feel they have is simply not defensible by licensing other industries’ generic video or music content because those industries will gladly sell and license the same content to all other players. A single custom studio of 150 employees also can not generate enough content to defensibly satisfy 76M+ customers. Only with quality primary software content from thousands of independent developers can you defend the brand and the product. Only by making the user experience simple, quick, and seamless can you defend the brand and the product. Never seen a better put statement of why an ecosystem of indies is essential.
- Data Feedback Loops for TV (Salon) — Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.
- wrk — a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
Web Tooltips, Free Good Security Book, Netflix Economics, and Firewire Hackery
- toolbar — tooltips in jQuery, cf hint.css which is tooltips in CSS.
- Security Engineering — 2ed now available online for free. (via /r/netsec)
- Economics of Netflix’s $100M New Show (The Atlantic) — Up until now, Netflix’s strategy has involved paying content makers and distributors, like Disney and Epix, for streaming rights to their movies and TV shows. It turns out, however, the company is overpaying on a lot of those deals. […] [T]hese deals cost Netflix billions.
- Inception — a FireWire physical memory manipulation and hacking tool exploiting IEEE 1394 SBP-2 DMA. The tool can unlock (any password accepted) and escalate privileges to Administrator/root on almost* any powered on machine you have physical access to. The tool can attack over FireWire, Thunderbolt, ExpressCard, PC Card and any other PCI/PCIe interfaces. (via BoingBoing)
Wikidata's structure vs. diverse knowledge, and a look at the many factors behind Netflix's recommendations.
A critic says Wikidata could undermine Wikipedia's localized information. Also, Netflix explains why its recommendation engine is much more complicated than most people realize.
Attorney Dana Newman discusses a proposed update to the '80s-era Video Privacy Protection Act.
The '80s-era Video Privacy Protection Act had the unintended consequence of inhibiting consensual sharing of video viewing habits. Attorney Dana Newman weighs in on updated legislation.
A vote against frictionless sharing, a look at cloud security threats, and why the open sourcing of Data.gov matters.
This week on O'Reilly: Mike Loukides explained why there's little value in frictionless sharing, Jeffrey Carr examined the significant security threats attached to cloud services, and we learned why the open sourcing of Data.gov is an important milestone for open government.
Tim Carmody on the lessons from Netflix and the facade of inevitability.
In this interview, Wired.com writer Tim Carmody examines the recent missteps of Netflix and he takes a broad look at how technology shapes the reading experience.