New opportunities in the maturing marketplace of big data components

Editor’s note: this is an excerpt from our new report Data: Emerging Trends and Technologies, by Alistair Croll. Download the free report here.

Here’s a look at some options in the evolving, maturing marketplace of big data components that are making the new applications and interactions we’ve been looking at possible.

Graph theory

First used in social network analysis, graph theory is finding more and more homes in research and business. Machine learning systems can scale up fast with tools like Parameter Server, and the TitanDB project means developers have a robust set of tools to use.

Are graphs poised to take their place alongside relational database management systems (RDBMS), object storage, and other fundamental data building blocks? What are the new applications for such tools?

Inside the black box of algorithms: whither regulation?

Data_Emerging_Trends_Tech_COMP_freedownloadbanner

Download the free report here

It’s possible for a machine to create an algorithm no human can understand. Evolutionary approaches to algorithmic optimization can result in inscrutable, yet demonstrably better, computational solutions.

If you’re a regulated bank, you need to share your algorithms with regulators. But if you’re a private trader, you’re under no such constraints. And having to explain your algorithms limits how you can generate them.

As more and more of our lives are governed by code that decides what’s best for us, replacing laws, actuarial tables, personal trainers, and personal shoppers, oversight means opening up the black box of algorithms so they can be regulated.

Years ago, Orbitz was shown to be charging web visitors who owned Apple devices more money than those visiting via other platforms, such as the PC. Only that’s not the whole story: Orbitz’s machine learning algorithms, which optimized revenue per customer, learned that the visitor’s browser was a predictor of their willingness to pay more.

Is this digital goldlining and upselling equivalent of redlining? Is a black-box algorithm inherently dangerous, brittle, and vulnerable to runaway trading and ignorant of unpredictable, impending catastrophes? How should we balance the need to optimize quickly with the requirement for oversight?

Automation

Marc Andreesen’s famous line that “software eats everything” is pretty true. It’s already finished its first course. Zeynep Tufecki says that first, machines came for physical labor, like the digging of trenches; then for mental labor, like Logarithm tables; and now for mental skills, which require more thinking, and possibly robotics.

Is this where automation is headed? For better or for worse, modern automation isn’t simply repetition. It involves adaptation, dealing with ambiguity and changing circumstance. It’s about causal feedback loops, with a system edging ever closer to an ideal state.

Past Strata speaker Avinash Kaushik chides marketers for wanting real-time data, observing that we humans can’t react fast enough for it to be useful. But machines can, and do, adjust in real time, turning every action into an experiment. Real-time data is the basis for a perfect learning loop.

Advances in fast, in-memory data processing deliver on the promise of cybernetics — mechanical, physical, biological, cognitive, and social systems in which an action that changes the environment in turn changes the system itself.

Data as a service

The programmable web was a great idea, here far too early. But if the old model of development was the LAMP stack, the modern equivalent is cloud, containers, and GitHub.

Cloud services make it easy for developers to prototype quickly and test a market or an idea — building atop Paypal, Google Maps, Facebook authentication, and so on.
Containers, moving virtual machines from data center to data center, are the fundamental building blocks of the parts we make ourselves.
And social coding platforms like GitHub offer fecundity, encouraging re-use and letting a thousand forks of good code bloom.

Even these three legs of the modern application are getting simpler. Consumer-friendly tools like Zapier and IFTTT let anyone stitch together simple pieces of programming to perform simple, repetitive tasks across myriad web platforms. Moving up the levels of complexity, there’s now Stamplay for building web apps as well.

When it comes to big data, developers no longer need to roll their own data and machine learning tools, either. Consider Google’s prediction API and BigQuery, Amazon Redshift and Kinesis. Or look at the dozens of start-ups offering specialized on-demand functions for processing data streams or big data applications.

What are the trade-offs between standing on the shoulders of giants and rolling your own? When is it best to build things from scratch in the hopes of some proprietary advantage, and when does it make sense to rely on others’ economies of scale? The answer isn’t clear yet, but in the coming years, the industry is going to find out where that balance lies, and it will decide the fate of hundreds of new companies and technology stacks.

New opportunities in the maturing marketplace of big data components

The evolving marketplace is making new data applications and interactions possible.

Graph theory

Inside the black box of algorithms: whither regulation?

Automation

Data as a service

Get the O’Reilly Data Newsletter