- Fairness in Machine Learning — read this fabulous presentation. Most ML objective functions create models accurate for the majority class at the expense of the protected class. One way to encode “fairness” might be to require similar/equal error rates for protected classes as for the majority population.
- Safety Constraints and Ethical Principles in Collective Decision-Making Systems (PDF) — self-driving cars are an example of collective decision-making between intelligent agents and, possibly, humans. Damn it’s hard to find non-paywalled research in this area. This is horrifying for the list of things you can’t ensure in collective decision-making systems.
- State of Hardware Report (Nate Evans) — rundown of the stats related to hardware startups. (via Renee DiResta)
- A Recent Discussion about DRM (Joi Ito) — strong arguments against including Digital Rights Management in W3C’s web standards (I can’t believe we’re still debating this; it’s such a self-evidently terrible idea to bake disempowerment into web standards).
"Big Data" entries
The O’Reilly Data Show podcast: Fang Yu on data science in security, unsupervised learning, and Apache Spark.
In this episode of the O’Reilly Data Show, I spoke with Fang Yu, co-founder and CTO of DataVisor. We discussed her days as a researcher at Microsoft, the application of data science and distributed computing to security, and hiring and training data scientists and engineers for the security domain.
DataVisor is a startup that uses data science and big data to detect fraud and malicious users across many different application domains in the U.S. and China. Founded by security researchers from Microsoft, the startup has developed large-scale unsupervised algorithms on top of Apache Spark, to (as Yu notes in our chat) “predict attack vectors early among billions of users and trillions of events.”
Several years ago, I found myself immersed in the security space and at that time tools that employed machine learning and big data were still rare. More recently, with the rise of tools like Apache Spark and Apache Kafka, I’m starting to come across many more security professionals who incorporate large-scale machine learning and distributed systems into their software platforms and consulting practices.
The O’Reilly Data Show podcast: Eric Colson on algorithms, human computation, and building data science teams.
Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.
In this episode of the O’Reilly Data Show, I spoke with Eric Colson, chief algorithms officer at Stitch Fix, and former VP of data science and engineering at Netflix. We talked about building and deploying mission-critical, human-in-the-loop systems for consumer Internet companies. Knowing that many companies are grappling with incorporating data science, I also asked Colson to share his experiences building, managing, and nurturing, large data science teams at both Netflix and Stitch Fix.
Augmented systems: “Active learning,” “human-in-the-loop,” and “human computation”
We use the term ‘human computation’ at Stitch Fix. We have a team dedicated to human computation. It’s a little bit coarse to say it that way because we do have more than 2,000 stylists, and these are very much human beings that are very passionate about fashion styling. What we can do is, we can abstract their talent into—you can think of it like an API; there’s certain tasks that only a human can do or we’re going to fail if we try this with machines, so we almost have programmatic access to human talent. We are allowed to route certain tasks to them, things that we could never get done with machines. … We have some of our own proprietary software that blends together two resources: machine learning and expert human judgment. The way I talk about it is, we have an algorithm that’s distributed across the resources. It’s a single algorithm, but it does some of the work through machine resources, and other parts of the work get done through humans.