Now’s the time of year for everyone to write about the trends they see in the coming year. I’ve resisted that in the past, but this year I’ll make an exception. We’ll see if it becomes a tradition. Here’s my quick list of six themes to watch in 2011:
The Hadoop family
Big data is no secret, and it grew so big in 2010 it can hardly count as a “trend” for 2011. Hadoop grew up with big data, and big data grew up with Hadoop. But what I’ve seen recently is the flowering of the Hadoop platform. It’s not just a single tool, it’s an ecosystem of tools that interoperate — and the total is more than the sum of its parts. Watch HBase, Pig, Hive, Mahout, Flume, ZooKeeper, and the rest of the elephantine family in the coming year.
Real time data
Websites may not be “real time” in a rigorous sense, but they certainly aren’t static, and they’ve gone beyond the decade-old notion of “dynamic,” in which the same set of inputs produced the same outputs. Sites like Twitter and Facebook change with time; users want to find out what’s happening now (or some reasonably relaxed version of now). Most of the tools we have for working with big datasets are batch-oriented, like Hadoop. One of the most exciting announcements of 2010 was the brief glimpse of Google’s Percolator, which enables streaming computation on Google-sized datasets. While Percolator is a proprietary product and will probably remain so, I would be willing to bet that there will be an open source tool performing the same function within the next year. Watch for it.
The rise of the GPU
Our ability to create data is outstripping our ability to compute with it. For a number of years, a subculture of data scientists have been using high-performance graphics cards as computational tools, whether or not they need graphics. The computational capabilities that are used for rendering graphics are equally useful for general vector computing. That trend is quickly becoming mainstream, as more and more industries find that they need the ability to process large amounts of data in real time (“real” real time, not web time): finance, biotech, robotics, almost anything that requires real-time results from large amounts of data.
Amazon’s decision to provide GPU-enabled EC2 instances (“Cluster GPU Instances”) validates the GPU trend. You won’t get the processing power you need at a price you want just by enabling traditional multicore CPUs. You need the dedicated computational units that GPUs provide.
The return of P2P
P2P has been rumbling in the background ever since Napster appeared. Recently, the rumblings have been getting louder. Many factors are coming together to drive a search for a new architectural model: the inability of our current provider paradigm to supply the kind of network we’ll need in the next decade, frustration with Facebook’s “Oops, we made a mistake” privacy policies, and even WikiLeaks. Whether we’re talking about Bob Frankston’s Ambient Connectivity, the architecture of Diaspora, Tor onion routing, or even rebuilding the Internet’s client services from the ground up on a peer-to-peer basis, the themes are the same: centralization of servers and network infrastructure are single points of control and single points of failure. And the solution is almost always some form of peer-to-peer architecture. The Internet routes around damage — and in the coming years, we’ll see the Internet repair itself. The time for P2P has finally come.
Everything is even more social
2010 was certainly been the year of Facebook. But I think that’s just the beginning of the social story, rather than the end. I don’t think the Internet will ossify into a Facebook-dominated world. Rather, I think we’ll see social features incorporated into everything: corporate sites, ecommerce sites, mobile apps, music, and books. Although Apple’s Ping is lame, and social music sites (such as MOG) are hardly new, Ping points the way: the incorporation of social features into new kinds of products.
The meaning of privacy
Any number of events this year have made it clear that we need to think seriously about what privacy means. We can’t agree with the people who say “There’s no such thing as privacy, get over it.” At the same time, insisting on privacy in stupidly rigid ways will paralyze the Internet and make it difficult, if not impossible, to explore new areas — including healthcare, government, sharing, and community. As Tim O’Reilly has said, what’s needed isn’t legislation, but a social consensus on what should and should not be done with data: how much privacy is reasonably needed, and what forms of privacy we can do without. We’re now in a position where solving those problems is not only possible, but necessary. I don’t expect much progress toward a solution in the next year, but I do expect to see the meaning of “privacy” discussed seriously.
A few more things