"cloud computing" entries
The O’Reilly Data Show podcast: The Hadoop ecosystem, the recent surge in interest in all things real time, and developments in hardware.
Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.
Given the quick pace of innovation in the data ecosystem, we like to take a step back from the details of individual components, architecture, and applications, in order to take a wider view of the landscape of big data. This allows us to evaluate the progress of technology and infrastructure along the way, shifting our attention from the details of individual components like Spark and Kafka, to larger trends.
Some of the larger trends we’ve been exploring include the capabilities of distributed machine learning and the tradeoffs and design decisions involved in cloud architecture and stream processing.
In this episode of the O’Reilly Data Show, I sat down with Jai Ranganathan, senior director of product management at Cloudera. We talked about the trends in the Hadoop ecosystem, cloud computing, the recent surge in interest in all things real time, and hardware trends:
Large-scale machine learning
This sounds a bit like this should already exist in really good form right now, but one of the things that I’m really interested in is expanding the set of capabilities for distributed machine learning. While there are systems out there today that do do this, I think relative to what you can experience from a singular environment learning scikit-learn or R, the set of things you can do in a distributed fashion is limited. … It’s not easy to distribute various algorithms and model-building techniques. I think there is still a lot of work for us to do to improve that experience. … And I do want to have good open source options like MLlib. MLlib may be the right answer. I would be perfectly happy if that’s the final answer, but we do need systems just to provide the kind of depth that you typically are used to in the singular environment. That’s just a matter of time and investment because these are non-trivial problems, but they are things that people are working on.
A look at the winners from a showcase of some of the most innovative big data startups.
At Strata + Hadoop World in London last week, we hosted a showcase of some of the most innovative big data startups. Our judges narrowed the field to 10 finalists, from whom they — and attendees — picked three winners and an audience choice.
Underscoring many of these companies was the move from software to services. As industries mature, we see a move from custom consulting to software and, ultimately, to utilities — something Simon Wardley underscored in his Data Driven Business Day talk, and which was reinforced by the announcement of tools like Google’s Bigtable service offering.
This trend was front and center at the showcase:
- Winner Modgen, for example, generates recommendations and predictions, offering machine learning as a cloud-based service.
- While second-place Brytlyt offers their high-performance database as an on-premise product, their horizontally scaled-out architecture really shines when the infrastructure is elastic and cloud based.
- Finally, third-place OpenSensors’ real-time IoT message platform scales to millions of messages a second, letting anyone spin up a network of connected devices.
Ultimately, big data gives clouds something to do. Distributed sensors need a widely available, connected repository into which to report; databases need to grow and shrink with demand; and predictive models can be tuned better when they learn from many data sets. Read more…
The risk relative to the savings isn’t enough to justify a shift to public cloud.
This post was originally published on Limn This. The lightly edited version that follows is republished with permission.
Last October, Simon Wardley and I stood on a rainy sidewalk at 28th St. in New York City arguing politely (he’s British) about the future of cloud adoption. He argued, rightly, that the cost advantages from scale would be overwhelming compared to home-brew private clouds. He went on to argue, less certainly in my view, that this would lead inevitably to their wholesale and deep adoption across the enterprise market.
I think Simon bases his argument on something like the rational economic man theory of the enterprise. Or, more specifically, the rational economic chief financial officer (CFO). If the costs of a service provider are destined to be lower than the costs of internally operated alternatives, and your CFO is rational (most tend to be), then the conclusion is foregone.
And, of course, costs are going down just as they are predicted to. Look at this post by Avi Deitcher: Does Amazon’s Web Services Pricing Follow Moore’s Law? I think the question posed in the title has a fairly obvious answer. No. Services aren’t just silicon; they include all manner of linear terms, like labor, so the price decreases will almost certainly be slower than Moore’s Law, but his analysis of the costs of a modestly-sized AWS solution and in-house competition is really useful.
Not only is AWS’ price dropping fast (56% in three years), but it’s significantly cheaper than building and operating a platform in house. Avi does the math for 600 instances over three years and finds that the cost for AWS would be $1.1 million (I don’t think this number considers out-year price decreases) versus $2.3 million for DIY. Your mileage might vary, but these numbers are a nice starting point for further discussion.
A new survey shows the market is ready for cloud-based big data services.
One night when our son was two years old, he abruptly decided that he didn’t like taking baths. As my wife recalls, he struggled mightily against the ritual of bathing for several months until, suddenly and mysteriously, he decided that he liked bathing again. We’re happy to report that he has managed to stay relatively clean ever since.
When I speak with CIOs and other IT leaders about moving big data operations into the cloud, I am reminded of our son’s unexplained loathing of the bathtub.
Nearly everyone associated with IT understands that most IT operations — including big data analytics — must eventually move into the cloud. The traditional on-premises approaches are simply too costly, and CIOs are under crushing pressure to shift budgetary resources to value-added, customer-facing activities.
For most companies, the writing is already on the wall. The cloud offers greater agility and elasticity, and quicker product development cycles — and can reduce costs. When you add up the benefits, it seems inevitable that the bulk of IT operations will move into the cloud. Nevertheless, the foot-dragging and excuse-making continues. Read more…
Can education and peer review keep a huge open source project on track?
When does a software project grow to the point where one must explicitly think about governance? The term “governance” is stiff and gawky, but doing it well can carry a project through many a storm. Over the past couple years, the crucial OpenStack project has struggled with governance at least as much as with the technical and organizational issues of coordinating inputs from thousands of individuals and many companies.
A major milestone was the creation of the OpenStack Foundation, which I reported on in 2011. This event successfully started the participants’ engagement with the governance question, but it by no means resolved it. This past Monday, I attended some of the Open Cloud Day at O’Reilly’s Open Source convention, and talked to a lot of people working for or alongside the OpenStack Foundation about getting contributors to work together successfully in an open community. Read more…
Analytic services are tailoring their solutions for specific problems and domains
In relatively short order Amazon’s internal computing services has become the world’s most successful cloud computing platform. Conceived in 2003 and launched in 2006, AWS grew quickly and is now the largest web hosting company in the world. With the recent addition of Kinesis (for stream processing), AWS continues to add services and features that make it an attractive platform for many enterprises.
A few other companies have followed a similar playbook: technology investments that benefit a firm’s core business, is leased out to other companies, some of whom may operate in the same industry. An important (but not well-known) example comes from finance. A widely used service provides users with clean, curated data sets and sophisticated algorithms with which to analyze them. It turns out that the world’s largest asset manager makes its investment and risk management systems available to over 150 pension funds, banks, and other institutions. In addition to the $4 trillion managed by BlackRock, the company’s Aladdin Investment Management system is used to manage1 $11 trillion in additional assets from external managers.
The Havana release features metering and orchestration
I talked this week to Jonathan Bryce and Mark Collier of OpenStack to look at the motivations behind the enhancements in the Havana release announced today. We focused on the main event–official support for the Ceilometer metering/monitoring project and the Heat orchestration project–but covered a few small bullet items as well.
Quality and security drive adoption, but community is rising fast
I recently talked to two managers of Black Duck, the first company formed to help organizations deal with the licensing issues involved in adopting open source software. With Tim Yeaton, President and CEO, and Peter Vescuso, Executive Vice President of Marketing and Business Development, I discussed the seventh Future of Open Source survey, from which I’ll post a few interesting insights later. But you can look at the slides for yourself, so this article will focus instead on some of the topics we talked about in our interview. While I cite some ideas from Yeaton and Vescuso, many of the observations below are purely my own.
The spur to collaboration
One theme in the slides is the formation of consortia that develop software for entire industries. One recent example everybody knows about is OpenStack, but many industries have their own impressive collaboration projects, such as GENIVI in the auto industry.
What brings competitors together to collaborate? In the case of GENIVI, it’s the impossibility of any single company meeting consumer demand through its own efforts. Car companies typically take five years to put a design out to market, but customers are used to product releases more like those of cell phones, where you can find something enticingly new every six months. In addition, the range of useful technologies—Bluetooth, etc.—is so big that a company has to become expert at everything at once. Meanwhile, according to Vescuso, the average high-end car contains more than 100 million lines of code. So the pace and complexity of progress is driving the auto industry to work together.
All too often, the main force uniting competitors is the fear of another vendor and the realization that they can never beat a dominant vendor on its own turf. Open source becomes a way of changing the rules out from under the dominant player. OpenStack, for instance, took on VMware in the virtualization space and Amazon.com in the IaaS space. Android attracted phone manufacturers and telephone companies as a reaction to the iPhone.
A valuable lesson can be learned from the history of the Open Software Foundation, which was formed in reaction to an agreement between Sun and AT&T. In the late 1980s, Sun had become the dominant vendor of Unix, which was still being maintained by AT&T. Their combination panicked vendors such as Digital Equipment Corporation and Apollo Computer (you can already get a sense of how much good OSF did them), who promised to create a single, unified standard that would give customers increased functionality and more competition.
The name Open Software Foundation was deceptive, because it was never open. Instead, it was a shared repository into which various companies dumped bad code so they could cynically claim to be interoperable while continuing to compete against each other in the usual way. It soon ceased to exist in its planned form, but did survive in a fashion by merging with X/Open to become the Open Group, an organization of some significance because it maintains the X Window System. Various flavors of BSD failed to dislodge the proprietary Unix vendors, probably because each BSD team did its work in a fairly traditional, closed fashion. It remained up to Linux, a truly open project, to unify the Unix community and ultimately replace the closed Sun/AT&T partnership.
Collaboration can be driven by many things, therefore, but it usually takes place in one of two fashions. In the first, somebody throws out into the field some open source code that everybody likes, as Rackspace and NASA did to launch OpenStack, or IBM did to launch Eclipse. Less common is the GENIVI model, in which companies realize they need to collaborate to compete and then start a project.
A bigger pie for all
The first thing on most companies’ minds when they adopt open source is to improve interoperability and defend themselves against lock-in by vendors. The Future of Open Source survey indicates that the top reasons for choosing open source is its quality (slide 13) and security (slide 15). This is excellent news because it shows that the misconceptions of open source are shattering, and the arguments by proprietary vendors that they can ensure better quality and security will increasingly be seen as hollow.
The silver lining in the role of cloud-based email in the CIA Director's resignation is a renewed focus on digital privacy.
This week, there’s an important issue before Washington that affects everyone who sends email, stores files in Dropbox or sends private messages on social media. In January, O’Reilly Media went dark in opposition to anti-piracy bills. Personally, I believe our right to digital due process for government to access private electronic are just as important.
Why? Here’s the context for my interest. The silver lining in the way former CIA Director David Petraeus’ affair was discovered may be its effect on the national debate around email and electronic privacy, and our rights in a surveillance state. The courts and Congress have failed to fully address the constitutionality of warrantless wiretapping of cellphones and the location of “persons of interest.” Phones themselves, however, are a red herring. What’s at stake is the Fourth Amendment in the 21st century, with respect to the personal user data that telecommunications and technology firms hold that government is requesting without digital due process.
On Thursday, the Senate Judiciary Committee will consider an update to the Electronic Communications Privacy Act (ECPA), the landmark 1986 legislation that governs the protections citizens have when they communicate using the Internet or cellphones. (It’s the small item on the bottom of this meeting page.)
UPDATE: Senator Leahy’s manager’s amendment to ECPA passed but Politico’s Tony Romm reports that the full Congress is unlikely to pass ECPA reform in this session.