Ellen Friedman

Ellen Friedman is a consultant and commentator, currently writing mainly about big data topics. She is a committer for the Apache Mahout project and a contributor to the Apache Drill project. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics including molecular biology, nontraditional inheritance, and oceanography. Ellen is also co-author of a book of magic-themed cartoons, A Rabbit Under the Hat. Ellen is on Twitter at @Ellen_Friedman.

New approaches to anomaly detection

A practical example of how anomaly detection makes complex data problems easier to solve.

Dots

As new tools for distributed storage and analysis of big data are becoming more stable and widely known, there is a growing need for discovering best practices for analytics at this scale. One of the areas of widespread interest that crosses many verticals is anomaly detection.

At its best, anomaly detection is used to find unusual, rarely occurring events or data for which little is known in advance. Examples include changes in sensor data reported for a variety of parameters, suspicious behavior on secure websites, or unexpected changes in web traffic. In some cases, the data patterns being examined are simple and regular and, thus, fairly easy to model.

Anomaly detection approaches start with some essential but sometimes overlooked ideas about anomalies:

  • Anomalies are defined not by their own characteristics but in contrast to what is normal.

Thus …

  • Before you can spot an anomaly, you first have to figure out what “normal” actually is.

This need to first discover what is considered “normal” may seem obvious, but it is not always obvious how to do it, especially in situations with complicated patterns of behavior. Best results are achieved when you use statistical methods to build an adaptive model of events in the system you are analyzing as a first step toward discovering anomalous behavior. Read more…

Comment: 1

An Invitation to Practical Machine Learning

PracticalMachineLearning_covDoes it make sense for me to have a car? If so, which one is the best choice for my needs: a gasoline, hybrid, or electric?  And should I buy or lease?

In order to make an effective decision, I need to understand key issues about the design, performance, and cost of cars, regardless of whether or not I actually know how to build one myself. The same is true for people deciding if machine learning is a good choice for their business goals or project.  Will the payoff be worth the effort?  What machine learning approach is most likely to produce valuable results for your particular situation? What size team with what expertise is necessary to be able to develop, deploy, and maintain your machine learning system?

Given the complex and previously esoteric nature of machine learning as a field – the sometimes daunting array of learning algorithms and the math needed to understand and employ them – many people feel the topic is one best left only to the few.

Read more…

Comments: 2