- Google Dev Apologies After Photos App Tags Black People as Gorillas (Ars Technica) — this is how you recover from a unequivocally horrendous mistake.
- IRS Finally Agrees to Release Non-Profit Records (BoingBoing) — Today, the IRS released a statement saying they’re going to do what we’ve been hoping for, saying they are going to release e-file data and this is a “priority for the IRS.” Only took $217,000 in billable lawyer hours (pro bono, thank goodness) to get there.
- Time Series Database Requirements — classic paper, laying out why time-series databases are so damn weird. Their access patterns are so unique because of the way data is over-gathered and pushed ASAP to the store. It’s mostly recent, mostly never useful, and mostly needed in order. (via Thoughts on Time-Series Databases)
- Compiler Errors for Humans — it’s so important, and generally underbaked in languages. A decade or more ago, I was appalled by Python’s errors after Perl’s very useful messages. Today, appreciating Go’s generally handy errors. How a system handles the operational failures that will inevitably occur is part and parcel of its UX.
A practical example of how anomaly detection makes complex data problems easier to solve.
As new tools for distributed storage and analysis of big data are becoming more stable and widely known, there is a growing need for discovering best practices for analytics at this scale. One of the areas of widespread interest that crosses many verticals is anomaly detection.
At its best, anomaly detection is used to find unusual, rarely occurring events or data for which little is known in advance. Examples include changes in sensor data reported for a variety of parameters, suspicious behavior on secure websites, or unexpected changes in web traffic. In some cases, the data patterns being examined are simple and regular and, thus, fairly easy to model.
Anomaly detection approaches start with some essential but sometimes overlooked ideas about anomalies:
- Anomalies are defined not by their own characteristics but in contrast to what is normal.
- Before you can spot an anomaly, you first have to figure out what “normal” actually is.
This need to first discover what is considered “normal” may seem obvious, but it is not always obvious how to do it, especially in situations with complicated patterns of behavior. Best results are achieved when you use statistical methods to build an adaptive model of events in the system you are analyzing as a first step toward discovering anomalous behavior. Read more…