Edd Dumbill

Edd Dumbill is a principal analyst for O'Reilly Radar, and program chair for the O'Reilly Strata Conference and the O'Reilly Open Source Convention.

Big data, cool kids

Making sense of the hype-cycle scuffle.

My data's bigger than yours!

My data’s bigger than yours!

The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity.

These child prodigies of the data scene show great promise but spend a lot of time knocking each other around in the schoolyard. Their egos can sometimes be too big to accept that everybody has their place, and eyeball-seeking media certainly doesn’t help.

POPULAR KID: Look at me! Big data is the hotness!
HADOOP: My data’s bigger than yours!
SCIPY: Size isn’t everything, Hadoop! The bigger they come, the harder they fall. And aren’t you named after a toy elephant?
R: Backward sentences mine be, but great power contains large brain.
EVERYONE: Huh?
SQL: Oh, so you all want to be friends again now, eh?!
POPULAR KID: Yeah, what SQL said! Nobody really needs big data; it’s all about small data, dummy.

The fact is that we’re fumbling toward the adolescence of big data tools, and we’re at an early stage of understanding how data can be used to create value and increase the quality of service people receive from government, business and health care. Big data is trumpeted in mainstream media, but many businesses are better advised to take baby steps with small data.
Read more…

Let’s do this the hard way

Being both liberal and safe in programming is hard

Recent discoveries of security vulnerabilities in Rails and MongoDB led me to thinking about how people get to write software.

In engineering, you don’t get to build a structure people can walk into without years of study. In software, we often write what the heck we want and go back to clean up the mess later. It works, but the consequences start to get pretty monumental when you consider the network effects of open source.

cartoon-37304_640You might think it’s a consequence of the tools we use—running fast and loose with scripting languages. I’m not convinced. Unusually among computer science courses, my alma mater taught us programming 101 with Ada. Ada is a language that more or less requires a retinal scan before you can use the compiler. It was a royal pain to get Ada to do anything you wanted: the philosophical inverse of Perl or Ruby. We certainly came up the “hard way.”

I’m not sure that the hard way was any better: a language that protects you from yourself doesn’t teach you much about the problems you can create.

But perhaps we are in need of an inversion of philosophy. Where Internet programming is concerned, everyone is quick to quote Postel’s law: “Be conservative in what you do, be liberal in what you accept from others.”

The fact of it is that being liberal in what you accept is really hard. You basically have two options: look carefully for only the information you need, which I think is the spirit of Postel’s law, or implement something powerful that will take care of many use cases. This latter strategy, though seemingly quicker and more future-proof, is what often leads to bugs and security holes, as unintended applications of powerful parsers manifest themselves.

My conclusion is this: use whatever language makes sense, but be systematically paranoid. Be liberal in what you accept, but conservative about what you believe.

Get the best start for data in your business

It's not about IT buying, but about making data work for you. Learn more in the Big Data in Enterprise IT program at Strata California.

In a world where technology and business are evermore intertwined, IT leaders aspire to key roles in their organizations. Sadly, industry conferences can lag behind, assuming IT is all about making the right buying decisions.

Not so at Strata.

Turning data into focused advantage requires strategy and planning over the whole business

Turning data into focused advantage requires strategy and planning over the whole business.
Photo credit: Ian Muttoo

Our approach is to take a view of data for business that centers around the problems you need to solve. The excitement around big data isn’t really about large volumes of data, it’s about smart use of data. It’s about using data to make your products better, help you be significantly more efficient, and create new products and businesses.

Getting the most from big data and data science is a lot more than a software choice. The business aims come first, and a good understanding of the problems you want to solve. Then you need to understand the capabilities of the technology and where data science can be best applied. Finally, you need to know how to run successful data projects, and how to hire and manage data teams.

Working with analytics and BI expert Mark Madsen, I’ve compiled a day-long program at Strata called Big Data in Enterprise IT that will take you through big data strategy, the issues of managing data, and how data science can be used effectively in your organization. Read more…

Aaron was one of us

We can change the future, and we must.

I sat last night at Aaron Swartz’s memorial in San Francisco, among the very people who built the Internet, the web, the culture of young entrepreneurialism and Web 2.0 startups. Among the pioneers of Creative Commons, Electronic Frontier Foundation, open source software and those fighting to keep the public domain public.

Aaron was one of them.

It was a family reunion, under dreadful circumstances nobody would have wished for.

In his life Aaron had worked and learned among the thoughtful leaders who built the web we now benefit from today. He worked with the W3C, when the web was still “1.0,” and then in the social web and the hotbed of innovation and startup culture at Y Combinator.

Aaron’s passion for providing access to knowledge drove the most recent years of his life, from the campaign against SOPA to the liberation of public court records from PACER. And of course the downloading of journal articles, leading to the events that has brought his death so much into the public eye. Yet as Carl Malamud passionately insisted last night, Aaron was not a lone actor, but part of a peaceful army of reformers. Read more…

Five big data predictions for 2013

Diversity and manageability are big data watchwords for the next 12 months.

Here are some of the key big data themes I expect to dominate 2013, and of course will be covering in Strata.

Emergence of a big data architecture

Leadenhall Building skyscraper Under Construction by Martin Pettitt, on FlickrThe coming year will mark the graduation for many big data pilot projects, as they are put into production. With that comes an understanding of the practical architectures that work. These architectures will identify:

  • best of breed tools for different purposes, for instance, Storm for streaming data acquisition
  • appropriate roles for relational databases, Hadoop, NoSQL stores and in-memory databases
  • how to combine existing data warehouses and analytical databases with Hadoop

Of course, these architectures will be in constant evolution as big data tooling matures and experience is gained.

In parallel, I expect to see increasing understanding of where big data responsibility sits within a company’s org chart. Big data is fundamentally a business problem, and some of the biggest challenges in taking advantage of it lie in the changes required to cross organizational silos and reform decision making.

One to watch: it’s hard to move data, so look for a starring architectural role for HDFS for the foreseeable future. Read more…

The future of programming

Unraveling what programming will need for the next 10 years.

Programming is changing. The PC era is coming to an end, and software developers now work with an explosion of devices, job functions, and problems that need different approaches from the single machine era. In our age of exploding data, the ability to do some kind of programming is increasingly important to every job, and programming is no longer the sole preserve of an engineering priesthood.

Is your next program for one of these? Photo credit: Steve Lodefink/Flickr.

Is your next program for one of these?
Photo credit: Steve Lodefink/Flickr.

Over the course of the next few months, I’m looking to chart the ways in which programming is evolving, and the factors that are affecting it. This article captures a few of those forces, and I welcome comment and collaboration on how you think things are changing.

Where am I headed with this line of inquiry? The goal is to be able to describe the essential skills that programmers need for the coming decade, the places they should focus their learning, and differentiating between short term trends and long term shifts. Read more…

Saving publishing, one tweet at a time

Helping both readers and writers look good on social media.

Traffic comes to online publishers in two ways: search and social. Because of this, writing for the tweet is a new discipline every writer and editor must learn. You’re not ready to publish until you find the well crafted headline that fits in 100 characters or so, and pick an image that looks great shared at thumbnail size on Facebook and LinkedIn.

But what of us, the intelligent reader? Nobody wants to look like a retweet bot for publishers. The retweet allows us no space to say why we ourselves liked an article.

Those of us with time to dedicate are familiar with crafting our own awkward commentaries: “gr8 insight in2 state of mob,” “saw ths tlk last Feb,” “govt fell off fiscal clf”. Most of the time it’s easier just to bookmark, or hit “read later,” and not put in the effort to share.

Rescue is at hand. The writer and programmer Paul Ford has created a bookmarklet, entitled Save Publishing. On activating the bookmarklet while viewing an article you wish to share, it highlights and makes clickable all the tweetable phrases from the page. Read more…

True data liberation with IFTTT and Google Drive

Web services combine to give us our data, and help us use it.

IFTTT action showing Twitter archiving to Evernote

An example IFTTT action archives tweets to Evernote

The web service IFTTT (If this, then that) accesses popular web applications via their APIs, and lets users create new actions based on changes. For instance, actions such as “upload photos to Flickr when I add them to my Dropbox folder”, or “send me email when frost is forecast”.

I had been tempted to classify IFTTT as a merely an interesting toy for playing with social media. Granted, it’s nice that I can archive all my tweets into an Evernote note, but so what? However, IFTTT’s growth in features is showing it to be more than a bauble. The service is becoming an empowering tool that gives users more control over their own data, previously often accessible by programmers alone.

Read more…

Why big data is big: the digital nervous system

Why we all need to understand and use big data.

Where does all the data in “big data” come from? And why isn’t big data just a concern for companies such as Facebook and Google? The answer is that the web companies are the forerunners. Driven by social, mobile, and cloud technology, there is an important transition taking place, leading us all to the data-enabled world that those companies inhabit today.

From exoskeleton to nervous system

Until a few years ago, the main function of computer systems in society, and business in particular, was as a digital support system. Applications digitized existing real-world processes, such as word-processing, payroll and inventory. These systems had interfaces back out to the real world through stores, people, telephone, shipping and so on. The now-quaint phrase “paperless office” alludes to this transfer of pre-existing paper processes into the computer. These computer systems formed a digital exoskeleton, supporting a business in the real world.

The arrival of the Internet and web has added a new dimension, bringing in an era of entirely digital business. Customer interaction, payments and often product delivery can exist entirely within computer systems. Data doesn’t just stay inside the exoskeleton any more, but is a key element in the operation. We’re in an era where business and society are acquiring a digital nervous system.

Read more…

Building conference programs: it’s about the attendee

The essential principles of conference development.

I’ve chaired computer industry conferences for ten years now. First for IDEAlliance (XML Europe, XTech), and recently with O’Reilly Media (OSCON, Strata). Over the years I have tried to balance three factors as I select talks: proposal quality, important new work, and practical value of the knowledge to the attendees.

As the competition for speaking slots at both Strata and OSCON reach intense levels, I wanted to articulate these factors, and the principles I use when compiling conference programs.

How the program is made

My guiding principle in putting a program together is value to the attendees. They’re why we do this. By putting out quality content and speakers, we attract thinking, interested attendees. In turn, our sponsors get a much better quality of conversation and customer contact through their presence at the event.

Here’s the process in a nutshell: proposals are invited through a public call for participation, and then reviewers, drawn from the industry community of experts, will grade and comment on each proposal. I and my co-chairs use this feedback, along with editorial judgement, to compile the final schedule. For keynotes, and a small number of breakout sessions, we will augment the review process by inviting talks we think are important for the program.

Read more…