Hadley Wickham

Hadley Wickham is Chief Scientist at RStudio and an Adjunct Professor at Rice University. He is an active member of the R community, has written and contributed to more than 40 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualisation. His research focusses on how to make data analysis better, faster, and easier, with a particular emphasis on the use of visualisation to better understand data and models.

Building pipelines to facilitate data analysis

A new operator from the magrittr package makes it easier to use R for data analysis.

Construction_of_Cedar_River_Pipeline_1900

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily flow from one tool to the next, focusing on asking and answering questions of the data, not struggling to jam the output from one function into the format needed for the next. Wouldn’t it be nice if the world worked this way! I spend a lot of my time thinking about this problem, and how to make the process of data analysis as fast, effective, and expressive as possible. Today, I want to show you a new technique that I’m particularly excited about.

R, at its heart, is a functional programming language: you do data analysis in R by composing functions. However, the problem with function composition is that a lot of it makes for hard-to-read code. For example, here’s some R code that wrangles flight delay data from New York City in 2013. What does it do? Read more…