# "cloud computing" entries

## Four short links: 3 March 2016

### Tagging People, Maintenance Anti-Pattern, Insourced Brains, and Chat UI

1. Human Traffickers Using RFID Chips (NPR) — It turns out this 20-something woman was being pimped out by her boyfriend, forced to sell herself for sex and hand him the money. “It was a small glass capsule with a little almost like a circuit board inside of it,” he said. “It’s an RFID chip. It’s used to tag cats and dogs. And someone had tagged her like an animal, like she was somebody’s pet that they owned.”
2. Software Maintenance is an Anti-PatternGovernments often use two anti-patterns when sustaining software: equating the “first release” with “complete” and moving to reduce sustaining staff too early; and how a reduction of staff is managed when a reduction in budget is appropriate.
3. Cloud Latency and Autonomous Robots (Ars Technica) — “Accessing a cloud computer takes too long. The half-second time delay is too noticeable to a human,” says Ishiguro, an award-winning roboticist at Osaka University in Japan. “In real life, you never wait half a second for someone to respond. People answer much quicker than that.” Tech moves in cycles, from distributed to centralized and back again. As with mobile phones, the question becomes, “what is the right location for this functionality?” It’s folly to imagine everything belongs in the same place.
4. Chat as UI (Alistair Croll) — The surface area of the interface is almost untestable. The UI is the log file. Every user interaction is also a survey. Chat is a great interface for the Internet of Things. It remains to be seen how many deep and meaningfuls I want to have with my fridge.

## Building a scalable platform for streaming updates and analytics

### The O’Reilly Data Show podcast: Evan Chan on the early days of Spark+Cassandra, FiloDB, and cloud computing.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

In this episode of the O’Reilly Data Show, I sit down with Evan Chan, distinguished engineer at Tuplejump. We talk about the early days of Spark (particularly his contributions to Spark/Cassandra integration), his interesting new open source project (FiloDB), and recent trends in cloud computing.

## Bringing Apache Spark & Apache Cassandra together

Datastax credits me with inspiring them to bring Spark into Cassandra … I think they’re very generous about that. I think I was one of the first folks to talk about the possibility of bringing Cassandra and Spark together. The vision that I saw was that Cassandra was really good for real-time updates, but what if we’re able to do more analytical queries on it? Then you could combine, basically, a platform that is really good for real-time updates with analytics.

## Graph databases are powering mission-critical applications

### The O’Reilly Data Show Podcast: Emil Eifrem on popular applications of graph technologies, cloud computing, and company culture.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

While most people associate graphs with social media analysis, there are a wide range of applications — including recommendations, fraud detection, I.T. operations, and security — that are routinely framed using graphs. This wide variety of use cases has led to rise to many interesting tools for storing, managing, visualizing, and analyzing massive graphs. The important thing to note is that graph databases are not limited to reporting and analytics, but are also being used to power mission critical applications.

In this episode of the O’Reilly Data Show, I sat down with Emil Eifrem, CEO and co-founder of Neo Technology. We talked about the early days of NoSQL, applications of graph databases, cloud computing, and company culture in the U.S. and Sweden.

## Graph and NoSQL databases

The relational database had been an accelerator, and here it’s really slowing us down. What we ended up concluding was that the problem was this mismatch between the shape of the data and the abstractions that were exposed by our infrastructure. At that point, we said, okay, what if we had a database that just exposed these amazing network-oriented data structures or graph-oriented data structures, but other than that, had all the properties of a relational database. Wouldn’t that be great? …  Ultimately, we said the famous last words: ‘Hey, let’s just build it ourselves. How hard can it be?’ It turns out it’s 15 years later!

2007 is when both the Dynamo paper had been published and the BigTable paper had been published out of Amazon and Google, respectively. That’s when, in early adopter circuits, the discourse started to change … maybe the era of the one-size-fits-all database is over. Maybe our job isn’t to take all of our data and shove it through a relational database. Maybe there are some other tools and technologies and abstractions out there that make better sense for some data. That was in ’07.  I really think it was as if lightning struck in the community. … . [Dynamo and BigTable were announced] and the next day, 12 open source projects, implementing it, and then the next day, 24 new ones. It was just crazy back then.

## Jai Ranganathan on architecting big data applications in the cloud

### The O’Reilly Data Show podcast: The Hadoop ecosystem, the recent surge in interest in all things real time, and developments in hardware.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

Given the quick pace of innovation in the data ecosystem, we like to take a step back from the details of individual components, architecture, and applications, in order to take a wider view of the landscape of big data. This allows us to evaluate the progress of technology and infrastructure along the way, shifting our attention from the details of individual components like Spark and Kafka, to larger trends.

Some of the larger trends we’ve been exploring include the capabilities of distributed machine learning and the tradeoffs and design decisions involved in cloud architecture and stream processing.

In this episode of the O’Reilly Data Show, I sat down with Jai Ranganathan, senior director of product management at Cloudera. We talked about the trends in the Hadoop ecosystem, cloud computing, the recent surge in interest in all things real time, and hardware trends:

## Large-scale machine learning

This sounds a bit like this should already exist in really good form right now, but one of the things that I’m really interested in is expanding the set of capabilities for distributed machine learning. While there are systems out there today that do do this, I think relative to what you can experience from a singular environment learning scikit-learn or R, the set of things you can do in a distributed fashion is limited. …  It’s not easy to distribute various algorithms and model-building techniques. I think there is still a lot of work for us to do to improve that experience. … And I do want to have good open source options like MLlib. MLlib may be the right answer. I would be perfectly happy if that’s the final answer, but we do need systems just to provide the kind of depth that you typically are used to in the singular environment. That’s just a matter of time and investment because these are non-trivial problems, but they are things that people are working on.

## Startups suggest big data is moving to the clouds

### A look at the winners from a showcase of some of the most innovative big data startups.

At Strata + Hadoop World in London last week, we hosted a showcase of some of the most innovative big data startups. Our judges narrowed the field to 10 finalists, from whom they — and attendees — picked three winners and an audience choice.

Underscoring many of these companies was the move from software to services. As industries mature, we see a move from custom consulting to software and, ultimately, to utilities — something Simon Wardley underscored in his Data Driven Business Day talk, and which was reinforced by the announcement of tools like Google’s Bigtable service offering.

This trend was front and center at the showcase:

• Winner Modgen, for example, generates recommendations and predictions, offering machine learning as a cloud-based service.
• While second-place Brytlyt offers their high-performance database as an on-premise product, their horizontally scaled-out architecture really shines when the infrastructure is elastic and cloud based.
• Finally, third-place OpenSensors’ real-time IoT message platform scales to millions of messages a second, letting anyone spin up a network of connected devices.

Ultimately, big data gives clouds something to do. Distributed sensors need a widely available, connected repository into which to report; databases need to grow and shrink with demand; and predictive models can be tuned better when they learn from many data sets. Read more…

## Public vs. private cloud: Price isn’t enough

### The risk relative to the savings isn’t enough to justify a shift to public cloud.

This post was originally published on Limn This. The lightly edited version that follows is republished with permission.

Last October, Simon Wardley and I stood on a rainy sidewalk at 28th St. in New York City arguing politely (he’s British) about the future of cloud adoption. He argued, rightly, that the cost advantages from scale would be overwhelming compared to home-brew private clouds. He went on to argue, less certainly in my view, that this would lead inevitably to their wholesale and deep adoption across the enterprise market.

I think Simon bases his argument on something like the rational economic man theory of the enterprise. Or, more specifically, the rational economic chief financial officer (CFO). If the costs of a service provider are destined to be lower than the costs of internally operated alternatives, and your CFO is rational (most tend to be), then the conclusion is foregone.

And, of course, costs are going down just as they are predicted to. Look at this post by Avi Deitcher: Does Amazon’s Web Services Pricing Follow Moore’s Law? I think the question posed in the title has a fairly obvious answer. No. Services aren’t just silicon; they include all manner of linear terms, like labor, so the price decreases will almost certainly be slower than Moore’s Law, but his analysis of the costs of a modestly-sized AWS solution and in-house competition is really useful.

Not only is AWS’ price dropping fast (56% in three years), but it’s significantly cheaper than building and operating a platform in house. Avi does the math for 600 instances over three years and finds that the cost for AWS would be $1.1 million (I don’t think this number considers out-year price decreases) versus$2.3 million for DIY. Your mileage might vary, but these numbers are a nice starting point for further discussion.

These results raise an interesting question: if the numbers are so compelling, why did Walmart just reveal that they are building a ginormous private cloud? Why would anyone? Read more…

## Big data’s move to the cloud

### A new survey shows the market is ready for cloud-based big data services.

One night when our son was two years old, he abruptly decided that he didn’t like taking baths. As my wife recalls, he struggled mightily against the ritual of bathing for several months until, suddenly and mysteriously, he decided that he liked bathing again. We’re happy to report that he has managed to stay relatively clean ever since.

When I speak with CIOs and other IT leaders about moving big data operations into the cloud, I am reminded of our son’s unexplained loathing of the bathtub.

Nearly everyone associated with IT understands that most IT operations — including big data analytics — must eventually move into the cloud. The traditional on-premises approaches are simply too costly, and CIOs are under crushing pressure to shift budgetary resources to value-added, customer-facing activities.

For most companies, the writing is already on the wall. The cloud offers greater agility and elasticity, and quicker product development cycles — and can reduce costs. When you add up the benefits, it seems inevitable that the bulk of IT operations will move into the cloud. Nevertheless, the foot-dragging and excuse-making continues. Read more…

## OpenStack creates a structure for managing change without a benevolent dictator

### Can education and peer review keep a huge open source project on track?

When does a software project grow to the point where one must explicitly think about governance? The term “governance” is stiff and gawky, but doing it well can carry a project through many a storm. Over the past couple years, the crucial OpenStack project has struggled with governance at least as much as with the technical and organizational issues of coordinating inputs from thousands of individuals and many companies.

A major milestone was the creation of the OpenStack Foundation, which I reported on in 2011. This event successfully started the participants’ engagement with the governance question, but it by no means resolved it. This past Monday, I attended some of the Open Cloud Day at O’Reilly’s Open Source convention, and talked to a lot of people working for or alongside the OpenStack Foundation about getting contributors to work together successfully in an open community. Read more…

## How did we end up with a centralized Internet for the NSA to mine?

### The Internet is naturally decentralized, but it's distorted by business considerations.

I’m sure it was a Wired editor, and not the author Steven Levy, who assigned the title “How the NSA Almost Killed the Internet” to yesterday’s fine article about the pressures on large social networking sites. Whoever chose the title, it’s justifiably grandiose because to many people, yes, companies such as Facebook and Google constitute what they know as the Internet. (The article also discusses threats to divide the Internet infrastructure into national segments, which I’ll touch on later.)

So my question today is: How did we get such industry concentration? Why is a network famously based on distributed processing, routing, and peer connections characterized now by a few choke points that the NSA can skim at its leisure?
A few other companies have followed a similar playbook: technology investments that benefit a firm’s core business, is leased out to other companies, some of whom may operate in the same industry. An important (but not well-known) example comes from finance. A widely used service provides users with clean, curated data sets and sophisticated algorithms with which to analyze them. It turns out that the world’s largest asset manager makes its investment and risk management systems available to over 150 pension funds, banks, and other institutions. In addition to the $4 trillion managed by BlackRock, the company’s Aladdin Investment Management system is used to manage1$11 trillion in additional assets from external managers.