The business value of unifying data

Practical applications of human-in-the-loop machine learning.


With hundreds, thousands, or even just tens of suppliers — each with different business units, payment terms, and locations — businesses are faced with a monumental task: unifying all of their supplier-related data, and fast so that it can be useful. In order to ask deep questions about their data, companies are increasingly looking for a single, unified view of their supply chain.

And yet, business data is often stored in different sources, systems, and formats, resulting in silos of information. These data silos take the form of enterprise resource planning systems, CSV files, spreadsheets, and relational databases. To pull together all of the data from these disparate sources, a business faces three interrelated challenges:

  1. Speed. Traditionally, businesses have attempted to catalog and organize supply chain data manually — profiling and integrating data themselves, which leads directly to the next challenge: cost.
  2. Cost. Manual work is expensive work. Usually more than one employee will need to work on the same data set in order to move quickly enough for the results to have any value for the business. Even with several employees working on the same data sets, this work will still not achieve what could be done on a machine scale.
  3. Efficiency. Relying completely on humans to organize and unify data is a situation ripe for error. Plus, there’s often no audit trail, and the work results in inherently incomplete views of information.

In a recent live demo by Dr. Clare Bernard, a field engineer at Tamr, I got a glimpse into how Tamr is using a combination of machine learning algorithms and input from subject matter experts to help businesses unify their data for analysis. A practice that uses short-term human intervention to actively improve machine models, human-in-the-loop machine learning is taking off across all types of industries, including fashion, automotive, and cloud services such as Google Maps.

A few of the companies already taking advantage of what Tamr calls its “end-to-end enterprise platform” include General Electric, Roche, and Toyota Motor Europe. The platform performs data profiling, schema mapping, record deduplication, and clustering, and presents data to its customers using various dashboards and analytic tools (such as Tableau or QlikView). One interesting capability of Tamr’s new platform is a “spreadsheet plug-in” that auto-populates and updates a business’ data in Google Sheets or Excel; it allows analysts to locate and map external data with their own internal data, and the platform prompts users with suggestions. Dr. Bernard explained that most of the time, data unification is achieved without human intervention, but when humans are necessary, the system generates questions for data experts across business units and departments, and their answers continually retrain the model.

To learn more and watch a free recording of the webinar, access it here.

This post is part of a collaboration between O’Reilly and Tamr. See our statement of editorial independence.

This post is part of our exploration into active learning, and the larger theme of Big Data and Artificial Intelligence: Intelligence Matters.

Cropped public domain image on article and category pages via the Internet Archive on Flickr.

tags: , , , , ,