Keep your data science efforts from derailing

Preview of upcoming session at Strata Santa Clara

By Marck Vaisman and Sean Murphy

Is your organization considering embracing data science? If so, we would like to give you some helpful advice on organizational and technical issues to consider before you embark on any initiatives or consider hiring data scientists. Join us, Sean Murphy and Marck Vaisman, two Washington, D.C. based data scientists and founding members of Data Community DC, as we walk you through the trials and tribulations of practicing data scientists at our upcoming talk at Strata.

We will discuss anecdotes and best practices, and finish by presenting the results of a survey we conducted last year to help understand the varieties of people, skills, and experiences that fall under the broad term of “Data Scientist”. We analyzed data from over 250 survey respondents, and are excited to share our findings, which will also be published soon by O’Reilly.

As is nicely summarized in the “Dark Side of Data Science” chapter of The Bad Data Handbook (Marck was a contributing author), we ask you – actually plead with you – to do the exact opposite of the following commandments:

I. Know nothing about thy data

Please spend time understanding the nuances, intricacies, sources, and structure of your data. Trust us, this time is well spent. As they say, 80% of time spent on analytic tasks is munging, cleaning, transforming, etc. Don’t let that be 90% or 95% or your effort.

II. Thou shalt provide your data scientists with a single tool for all tasks

No single tool can perform all possible data science tasks. Many different tools exist, and each tool has a specific purpose. Please provide data scientists access to the tools they need, and also the ability to configure them as needed – at least in research and development environments – without making them jump through hoops to do so.

III. Thou shalt analyze for analysis’ sake only

Some analytical exercises begin as open exploration; others begin with a specific question in mind, and end up answering a different one. Regardless, before you embark on an investigation, please have some idea of where you want to go. Please, don’t do analysis just to say you are doing Data Science or because you have a lot of data. It’s pointless.

IV. Thou shalt compartmentalize learnings

We learned to share when we were children. Please share your learnings and findings within your organizations, as appropriate, to avoid duplicating work and wasting your time and ours.

V. Thou shalt expect omnipotence from data scientists

This is, by far, our favorite commandment. We have run into numerous situations where organizations expect miracles because of the hype surrounding data science. Additionally, there seems to be a lack of awareness of the variety of skills that data scientists have leading organizations to wasted time and effort when trying to find talent due to this misunderstanding.

As practitioners, we advocate that organizations and management please adjust their expectations accordingly, and that they should consider assembling a team whose members’ broad skills have much overlap while their unique expertise does not. This will be further explored in our talk during the discussion of the survey results.

tags: , , , , ,