The 0th Law of Data Mining

Preview of The Laws of Data Mining Session at Strata Santa Clara 2013

Many years ago I was taught about the three laws of thermodynamics. When that didn’t stick, I was taught a quick way to remember originally identified by C.P. Snow:

  • 1st Law: you can’t win
  • 2nd Law: you can’t draw
  • 3rd Law: you can’t get out of the game

These laws (well the real ones) were firmly established by the mid 19th century. Yet, it wasn’t until the 1930s that the value of the 0th law was identified.

At Strata I’m going to be talking about the 9 Laws of Data Mining – a set of principles identified by Tom Khabaza and very closely related to the CRISP-DM data mining methodology.

They may possibly, just possibly, not be as important as the laws of thermodynamics, but at Strata they will be supported by an equally important 0th Law.

Law zero: The 9 Laws of Data Mining are equally relevant to Data Science

Why is this important?

Well despite its analytical aspects, data science has a recent history of coding and this applies even more to data scientists. Data scientists may be happy jumping into Python, talking details of MapReduce, or just hacking about in Java. Once data miners were too. Although the languages were more likely to be SAS Procs, or some of the less pleasant choices of the 70s, 80s and 90s.

There are a couple of things about this that are potentially problematic. These are the lessons that data miners had to go through as their profession matured.

Firstly, coding tends to encourage people to think in a certain way, with a focus on the task at hand. Necessary. Vital. But the best data miners (and coders) have learnt to simultaneously keep their eyes on the prize – the bigger picture of the business problem that needs solving.

Data scientists also need to move away from the certainties of code towards the probabilities of analytics. I realize that sometimes the only certainty might be the failure of the code that you’ve been working on to compile. Yet, good analytics is about uncertainty and the ability to understand and act on that.

Finally data scientists, like data miners before them, need to see that they are only doing this because it makes a difference. To a business, a not-for-profit, government. Not because it’s fun. Or cool. Or exciting. Especially when it is.

The 9 Laws

So what are the 9 Laws? Well for a start they won’t tell you the “last two algorithms you will ever need” nor will they explain Bayes’ theorem.

What they are is a way of focusing on the thought processes that drive data mining and data science towards real, measurable and achievable results.

Now, hopefully, you can see why we needed the zeroth law and hear more about it at my upcoming session.

tags: , , , , ,