The Internet of Things (IoT) is more than a network of smart toasters, refrigerators, and thermostats. For the moment, though, domestic appliances are the most visible aspect of the IoT. But they represent merely the tip of a very large and mostly invisible iceberg.
IDC predicts by the end of 2020, the IoT will encompass 212 billion “things,” including hardware we tend not to think about: compressors, pumps, generators, turbines, blowers, rotary kilns, oil-drilling equipment, conveyer belts, diesel locomotives, and medical imaging scanners, to name a few. Sensors embedded in such machines and devices use the IoT to transmit data on such metrics as vibration, temperature, humidity, wind speed, location, fuel consumption, radiation levels, and hundreds of other variables.
“Machines can be very chatty,” says William Ruh, a vice president and corporate officer at GE.
Ruh’s current focus is to drive the company’s efforts to develop an “industrial” Internet that blends three elements: intelligent machines, advanced analytics, and empowered users. Together, those elements generate a variety of data at a rapid pace, creating a deluge that makes early definitions of big data seem wildly understated.
Making sense of that data and using it to produce a steady stream of usable insights require infrastructure and processes that are fast, accurate, reliable, and scalable. Merely collecting data and loading it into a data warehouse is not sufficient — you also need capabilities for accessing, modeling, and analyzing your data; a system for sharing results across a network of stakeholders; and a culture that supports and encourages real-time collaboration.
What you don’t need is a patchwork of independent data silos in which information is stockpiled like tons of surplus grain. What you do need are industrial-grade, integrated processes for managing and extracting value from IoT data and traditional sources.
Dan Graham, general manager for enterprise systems at Teradata, sees two distinct areas in which integrated data will create significant business value: product development and product deployment.
“In the R&D or development phase, you will use integrated data to see how all the moving parts will work together and how they interact. You can see where the friction exists. You’re not looking at parts in isolation. You can see the parts within the context of your supply chain, inventory, sales, market demand, channel partners, and many other factors,” says Graham.
The second phase is post-sales deployment. “Now you use your integrated data for condition-based (predictive) maintenance. Airplanes, locomotives, earth movers, automobiles, disk drives, ATMs, and cash registers require continual care and support. Parts wear out and fail. It’s good to know which parts from which vendors fail, how often they fail, and the conditions in which they fail. Then you can take the device or machine offline and repair it before it breaks down,” says Graham.
For example, microscopic changes in the circumference of a wheel or too little grease on the axle of a railroad car, can result in delays and even derailments of high-speed freight trains. Union Pacific, the largest railroad company in the US, uses a sophisticated system of sensors and analytics to predict when critical parts are likely to fail, enabling maintenance crews to fix problems while rolling stock is in the rail yard. The alternative, which is both dangerous and expensive, would be waiting for parts to fail while the trains are running.
Collecting data and loading it into a data warehouse is not sufficient. You also need capabilities for accessing, modeling, and analyzing your data.Union Pacific uses infrared and audio sensors placed on its tracks to gauge the state of wheels and bearings as the trains pass by. It also uses ultrasound to spot flaws or damage in critical components that could lead to problems. On an average day, the railroad collects 20 million sensor readings from 3,350 trains and 32,000 miles of track. It then uses pattern-matching algorithms to detect potential issues and flag them for action. The effort is already paying off: Union Pacific has cut bearing-related derailments by 75%, says Graham.1
NCR Corporation, which pioneered the mechanical cash register in the 19th century, is currently the global leader in consumer transaction technologies. The company provides software, hardware, and services, enabling more than 485 million transactions daily at large and small organizations in retail, financial, travel, hospitality, telecom, and technology sectors. NCR gathers data telemetrically from the IoT — data generated by ATMs, kiosks, point-of-sale terminals, and self-service checkout machines handling a total of about 3,500 transactions per second. NCR then applies its own custom algorithms to predict which of those devices is likely to fail and to make sure the right technician, with the right part, reaches the right location before the failure occurs.
Underneath the hood of NCR’s big data/IoT strategy is a unified data architecture that combines an integrated data warehouse, Hadoop, and the Teradata Aster Discovery Platform. The key operating principle is integration, which assures that data flowing in from the IoT is analyzed in context with data from multiple sources.
“The name of the game is exogenous data,” says Michael Minelli, an executive at MasterCard and co-author of Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. “You need the capabilities and skills for combining and analyzing data from various sources that are outside the four walls of your organization. Then you need to convert data into actionable insights that will drive better decisions and grow your business. Data from the IoT is just one of many external sources you need to manage in combination with the data you already own.”
From Minelli’s perspective, data from the IoT is additive and complementary to the data in your data warehouse. Harvey Koeppel, former CIO at Citigroup Global Consumer Banking, agrees. “The reality is that there is still a legacy environment, and it’s not going away anytime soon. Facts are facts; they need to be collected, stored, organized, and maintained. That’s certainly the case for Fortune 1000 companies, and I expect it will remain that way for the foreseeable future,” says Koeppel.
Big data collected from the IoT tends to be “more ephemeral” than traditional types of data, says Koeppel. “Geospatial data gathered for a marketing campaign is different than financial data stored in your company’s book of record. Data that’s used to generate a coupon on your mobile phone is not in the same class as data you’re required to store because of a government regulation.”
That said, big data from the IoT is rapidly losing its status as a special case or oddity. With each passing day, big data is perceived as just another item on the menu. Ideally, your data architecture and data warehouse systems would enable you to work with whichever type of data you need, whenever you need it, to create actionable insights that lead to improved outcomes across a variety of possible activities.
“In the best of all worlds, we would blend data from the IoT with data in the data warehouse to create the best possible offers for consumers in real time or to let you know that your car is going to run out of gas 10 minutes from now,” says Koeppel. “The thoughtful approach is combining data from a continuum of sources, ranging from the IoT to the traditional data warehouse.”
1 Murphy, Chris. “High-Speed Analytics: Union Pacific shows the potential of the instrumented, interconnected, analytics-intensive enterprise.” Information Week, August 13, 2012.
This post is part of a collaboration between O’Reilly and Teradata exploring the convergence of hardware and software. See our statement of editorial independence.
Photo by Rick Cooper, on Wikimedia Commons.