The application of real-time data

Hilary Mason on how Bitly applies the Internet's real-time data.

From her vantage point as chief scientist of Bitly, Hilary Mason has interesting insight into the real-time web and what people are sharing, posting, clicking and reading.

I recently spoke with Mason about Bitly’s analysis and usage of real-time data. She’ll be digging into these same topics at next month’s Strata Conference in New York.

Our interview follows.

How does Bitly develop its data products and processes?

Hilary MasonHilary Mason: Our primary goal at Bitly is to understand what’s happening on the Internet in real-time. We work by stating the problem we’re trying to solve, brainstorming methods and models on the whiteboard, then experimenting on subsets of the data. Once we have a methodology in mind that we’re fairly certain will work at scale, we build a prototype of the system, including data ingestion, storage, processing, and (usually) an API. Once we’ve proven it at that scale, we might decide to scale it to the full dataset or wait and see where it will plug into a product.

How does data drive Bitly’s application of analytics and data science?

Hilary Mason: Bitly is a data-centric organization. The data informs business decisions, the potential of the product, and certainly our own internal processes. That said, it’s important to draw a distinction between analytics and data science. Analytics is the measurement of well-understood metrics. Data science is the invention of new mathematical and algorithmic approaches to understanding the data. We do both, but apply them in very different ways.

What are the most important applications of real-time data?

Hilary Mason: The most important applications of real-time data apply to situations where having analysis immediately will change the outcome. More practically, when you can ask a question and get the answer before you’ve forgotten why you asked the question in the first place, it makes you massively more productive.

This interview was edited and condensed.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Related:

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.