Big data’s big ideas

From cognitive augmentation to artificial intelligence, here's a look at the major forces shaping the data world.

Big data’s big ideas

Looking back at the evolution of our Strata events, and the data space in general, we marvel at the impressive data applications and tools now being employed by companies in many industries. Data is having an impact on business models and profitability. It’s hard to find a non-trivial application that doesn’t use data in a significant manner. Companies who use data and analytics to drive decision-making continue to outperform their peers.

Up until recently, access to big data tools and techniques required significant expertise. But tools have improved and communities have formed to share best practices. We’re particularly excited about solutions that target new data sets and data types. In an era when the requisite data skill sets cut across traditional disciplines, companies have also started to emphasize the importance of processes, culture, and people.

As we look into the future, here are the main topics that guide our current thinking about the data landscape.

Note: This document represents our thinking as of October 2014. You can keep up with the latest analysis and developments in the data space through the O’Reilly Data newsletter.

Cognitive augmentation

The combination of big data, algorithms, and efficient user interfaces can be seen in consumer applications such as Waze or Google Now. Our interest in this topic stems from the many tools that democratize analytics and, in the process, empower domain experts and business analysts. In particular, novel visual interfaces are opening up new data sources and data types.

Examples:

  • Narrative Science adds descriptive summaries to the output generated by business intelligence tools (dashboards, charts, and tables).
  • Palantir and Quid use a combination of visualization, search, and analytics that enable domain experts to discover patterns hidden in large data sets.
  • StitchFix provides product recommendations by combining proprietary algorithms and expert stylists.
  • “Moving dots” (e.g. tracking data from athletics) are being analyzed by companies that specialize in spatio-temporal pattern recognition. Startup Second Spectrum provides analytics to coaches and front offices in many professional basketball teams. In the near future, their technology and recommendations will be available in real time to coaching staffs during in-game situations.

Intelligence matters: Artificial intelligence and algorithms

Bring up the topic of algorithms, and a discussion on recent developments in artificial intelligence (AI) is sure to follow. AI is the subject of an ongoing series of posts on O’Reilly Radar. The “unreasonable effectiveness of data” notwithstanding, algorithms remain an important area of innovation. We’re excited about the broadening adoption of algorithms like deep learning, and topics like feature engineeringgradient boosting, and active learning. As intelligent systems become common, security and privacy become critical. We’re interested in efforts to make machine learning secure in adversarial environments.

Related resources:

The convergence of cheap sensors, fast networks, and distributed computation

The Internet of Things (IoT) will require systems that can process and unlock massive amounts of event data. These systems will draw from analytic platforms developed for monitoring IT operations. Beyond data management, we’re following recent developments in streaming analytics and the analysis of large numbers of time series.

Related resources:

Data (science) pipelines

Analytic projects involve a series of steps that often require different tools. There are a growing number of companies and open source projects that integrate a variety of analytic tools into coherent user interfaces and packages. Many of these integrated tools enable replication, collaboration, and deployment. This remains an active area, as specialized tools rush to broaden their coverage of analytic pipelines.

Examples and related resources:

Evolving, maturing marketplace of big data components

Many popular components in the big data ecosystem are open source. As such, many companies build their data infrastructure and products by assembling components like Spark, Kafka, Cassandra, and ElasticSearch, among others. Contrast that to a few years ago when many of these components weren’t ready (or didn’t exist) and companies built similar technologies from scratch. But companies are interested in applications and analytic platforms, not individual components. To that end, demand is high for data engineers and architects who are skilled in maintaining robust data flows, data storage, and assembling these components.

Examples and related resources:

Data scientists, design, and social science

To be clear, data analysts have always drawn from social science (e.g., surveys, psychometrics) and design. We are, however, noticing that many more data scientists are expanding their collaborations with product designers and social scientists.

Examples and related resources:

Building a data culture

“Data-driven” organizations excel at using data to improve decision-making. It all starts with instrumentation. “If you can’t measure it, you can’t fix it,” says DJ Patil, VP of product at RelateIQ. In addition, developments in distributed computing over the past decade have given rise to a group of (mostly technology) companies that excel in building data products. In many instances, data products evolve in stages (starting with a “minimum viable product”) and are built by cross-functional teams that embrace alternative analysis techniques.

Related resources:

  • Building Data Science Teams: Data scientists are at the forefront of innovation in many data-driven organizations. This report offers practical advice for constructing teams that can drive that innovation.
  • Just Enough Math is a video series that introduces mathematical concepts using business cases.
  • Lean Analytics: Acquire a data-driven mindset through 30 case studies.
  • Data Jujitsu: A primer on organizing teams and building data products.

Perils of big data

Every few months, there seems to be an article criticizing the hype surrounding big data. Dig deeper and you find that many of the criticisms point to poor analysis and highlight issues known to experienced data analysts. Our perspective is that issues such as privacy and the cultural impact of models are much more significant.

Examples and related resources:

We’ll also explore each of these topics through our publishing program, events, webcasts, and online coverage. These explorations work best when they’re two-way roads, so please share your feedback through Twitter (@bigdata) or in the comments below.

tags: , , , , , , , , , ,