"disruption" entries

Six disruptive possibilities from big data

Specific ways big data will inundate vendors and customers.

Disruptive PossibilitiesMy new book, Disruptive Possibilities: How Big Data Changes Everything, is derived directly from my experience as a performance and platform architect in the old enterprise world and the new, Internet-scale world.

I pre-date the Hadoop crew at Yahoo!, but I intimately understood the grid engineering that made Hadoop possible. For years, the working title of this book was The Art and Craft of Platform Engineering, and when I started working on Hadoop after a stint in the Red Hat kernel group, many of the ideas that were jammed into my head, going back to my experience with early supercomputers, all seem to make perfect sense for Hadoop. This is why I frequently refer to big data as “commercial supercomputing.”

In Disruptive Possibilities, I discuss the implications of the big data ecosystem over the next few years. These implications will inundate vendors and customers in a number of ways, including: Read more…

Four short links: 19 April 2013

Four short links: 19 April 2013

Sterling on Disruption, Coding Crypto Fun, Distributed File System, and Asset Packaging

  1. Bruce Sterling on DisruptionIf more computation, and more networking, was going to make the world prosperous, we’d be living in a prosperous world. And we’re not. Obviously we’re living in a Depression. Slow first 25% but then it takes fire and burns with the heat of a thousand Sun Microsystems flaming out. You must read this now.
  2. The Matasano Crypto Challenges (Maciej Ceglowski) — To my delight, though, I was able to get through the entire sequence. It took diligence, coffee, and a lot of graph paper, but the problems were tractable. And having completed them, I’ve become convinced that anyone whose job it is to run a production website should try them, particularly if you have no experience with application security. Since the challenges aren’t really documented anywhere, I wanted to describe what they’re like in the hopes of persuading busy people to take the plunge.
  3. Tachyona fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. Berkeley-licensed open source.
  4. Jammit (GitHub) — an industrial strength asset packaging library for Rails, providing both the CSS and JavaScript concatenation and compression that you’d expect, as well as YUI Compressor, Closure Compiler, and UglifyJS compatibility, ahead-of-time gzipping, built-in JavaScript template support, and optional Data-URI / MHTML image and font embedding. (via Joseph Misiti)

Aereo’s copyright solution: intentional inefficiency

Aereo's backward architecture could be the thing that keeps it in business.

Aereo, an online service that sends free over-the-air television broadcasts to subscribers, scored a big win in court this week.

At first glance, it would seem the service has to violate copyright. Aereo is grabbing TV content without paying for it and then passing it along to Aereo’s paying subscribers.

So how is Aereo pulling it off? Over at Ars Technica, Timothy B. Lee deconstructs the service’s blend of tech and legal precedent:

Aereo’s technology was designed from the ground up to take advantage of a landmark 2008 ruling holding that a “remote” DVR product offered by Cablevision was consistent with copyright law. Key to that ruling was Cablevision’s decision to create a separate copy of recorded TV programs for each user. While creating thousands of redundant copies makes little sense from a technical perspective, it turned out to be crucial from a legal point of view …

… When a user wants to view or record a television program, Aereo assigns him an antenna exclusively for his own use. And like Cablevision, when 1,000 users record the same program, Aereo creates 1,000 redundant copies. [Links included in original text; emphasis added.]

Creating lots of copies of the exact same content is inefficient. No one can argue that point. But if you can get past the absurdity, you have to admit Aereo’s architecture is quite clever. Take thousands of tiny antennas, combine them with abundant storage, and now you’ve got a disruptive service that might survive the onslaught of litigation.

Note: Aereo’s recent win only applies to a request for a preliminary injunction. Further court proceedings are likely, and you can bet there will be a long and winding appeals process.

To eat or be eaten?

What's interesting isn't software as a thing in itself, but software as a component of some larger system.

One of Marc Andreessen’s many accomplishments was the seminal essay “Why Software is Eating the World.” In it, the creator of Mosaic and Netscape argues for his investment thesis: everything is becoming software. Music and movies led the way, Skype makes the phone company obsolete, and even companies like Fedex and Walmart are all about software: their core competitive advantage isn’t driving trucks or hiring part-time employees, it’s the software they’ve developed for managing their logistics.

I’m not going to argue (much) with Marc, because he’s mostly right. But I’ve also been wondering why, when I look at the software world, I get bored fairly quickly. Yeah, yeah, another language that compiles to the JVM. Yeah, yeah, the Javascript framework of the day. Yeah, yeah, another new component in the Hadoop ecosystem. Seen it. Been there. Done that. In the past 20 years, haven’t we gained more than the ability to use sophisticated JavaScript to display ads based on a real-time prediction of the user’s next purchase?

When I look at what excites me, I see a much bigger world than just software. I’ve already argued that biology is in the process of exploding, and the biological revolution could be even bigger than the computer revolution. I’m increasingly interested in hardware and gadgetry, which I used to ignore almost completely. And we’re following the “Internet of Things” (and in particular, the “Internet of Very Big Things”) very closely. I’m not saying that software is irrelevant or uninteresting. I firmly believe that software will be a component of every (well, almost every) important new technology. But what grabs me these days isn’t software as a thing in itself, but software as a component of some larger system. The software may be what makes it work, but it’s not about the software. Read more…

A startup takes on “the paper problem” with crowdsourcing and machine learning

With a new mobile app and API, Captricity wants to build a better bridge between analog and digital.

Unlocking data from paper forms is the problem that optical character recognition (OCR) software is supposed to solve. Two issues persist, however. First, the hardware and software involved are expensive, creating challenges for cash-strapped nonprofits and government. Second, all of the information on a given document is scanned into a system, including sensitive details like Social Security numbers and other personally identifiable information. This is a particularly difficult issue with respect to health care or bringing open government to courts: privacy by obscurity will no longer apply.

The process of converting paper forms into structured data still hasn’t been significantly disrupted by rapid growth of the Internet, distributed computing and mobile devices. Fields that range from research science to medicine to law to education to consumer finance to government all need better, cheaper bridges from the analog to the digital sphere.

Enter Captricity. The startup, which was co-founded by Jeff J. Lin and Kuang Chen, has its roots in the fieldwork on rural health Chen did as part of his PhD program.

“I was looking at the information systems that were available to these low-resource organizations,” Chen said in a recent phone interview. “I saw that they’re very much bound in paper. There’s actually a lot of efforts to modernize the infrastructure and put in mobile phones. Now that there’s mobile connectivity, you can run a health clinic on solar panels and long distance Wi-Fi. At the end of the day, however, business processes are still on paper because they had to be essentially fail-proof. Technology fails all the time. From that perspective, paper is going to stick around for a very long time. If we’re really going to tackle the challenge of the availability of data, we shouldn’t necessarily be trying to change the technology infrastructure first — bringing mobile phones and iPads to where there’s paper — but really to start with solving the paper problem.”

When Chen saw that data entry was a chokepoint for digitizing health indicators, he started working on developing a better, cheaper way to ingest data on forms. Read more…

When data disrupts health care

The convergence of data, privacy and cost have created a unique opportunity to reshape health care.

Health care appears immune to disruption. It’s a space where the stakes are high, the incumbents are entrenched, and lessons from other industries don’t always apply.

Yet, in a recent conversation between Tim O’Reilly and Roger Magoulas it became evident that we’re approaching an unparalleled opportunity for health care change. O’Reilly and Magoulas explained how the convergence of data access, changing perspectives on privacy, and the enormous expense of care are pushing the health space toward disruption.

As always, the primary catalyst is money. The United States is facing what Magoulas called an “existential crisis in health care costs” [discussed at the 3:43 mark]. Everyone can see that the current model is unsustainable. It simply doesn’t scale. And that means we’ve arrived at a place where party lines are irrelevant and tough solutions are the only options.

“Who is it that said change happens when the pain of not changing is greater than the pain of changing?” O’Reilly asked. “We’re now reaching that point.” [3:55]

(Note: The source of that quote is hard to pin down, but the sentiment certainly applies.)

This willingness to change is shifting perspectives on health data. Some patients are making their personal data available so they and others can benefit. Magoulas noted that even health companies, which have long guarded their data, are warming to collaboration.

At the same time there’s a growing understanding that health data must be contextualized. Simply having genomic information and patient histories isn’t good enough. True insight — the kind that can improve quality of life — is only possible when datasets are combined.

Read more…

We're in the midst of a restructuring of the publishing universe (don't panic)

Hugh McGuire says the disruption publishing has endured is a mere hint of what's to come.

Hugh McGuire, co-author of "Book: A Futurist's Manifesto," explains why publishing's digital transformation goes way beyond format shifts. He also reveals nine ways the publishing industry will change over the next five years.

Top Stories: September 12-16, 2011

Building data science teams, the evolution of data products, and the grunt work of data journalism.

This week on O'Reilly: DJ Patil revealed the skills and qualities of great data science teams, we learned that new data products put emphasis on experiences rather than on the data itself, and Simon Rogers discussed the considerable effort that goes into The Guardian's data journalism.

The 2010 technology of the year is …

A simple and disruptive service is Jonathan Reichental's choice for most important technology of 2010.

Jonathan Reichental's 2010 technology of the year is notable not just for what it accomplished in the previous year, but also because of its considerable potential.

Lessons from Digital Disruption in the Music Business

Last week's On The Media (mp3 download here) devoted the full program to challenges and changes during the past decade or so in the music business — from the unanswered legal questions about sampling (check out Girl Talk for the genre taken to the extreme) to the shifting economics of concert tickets and promotion to the changing role of…