How to be agile with your big data

Agile methodology brings flexibility to the EDW and offers ways to integrate open-source technologies with existing systems.

Data analysis, like other pursuits, is a balancing act. The rise of big data ratchets up the pressure on the traditional enterprise data warehouse (EDW) and associated software tools to handle rapidly evolving sets of new demands posed by the business. Companies want their EDW systems to be more flexible and more user friendly — without sacrificing processing speeds, data integrity, or overall reliability.

“The more data you give the business, the more questions they will ask,” says José Carlos Eiras, who has served as CIO at Kraft Foods, Philip Morris, General Motors, and DHL. “When you have big data, you have a lot of different questions, and suddenly you need an enterprise data warehouse that is very flexible.”

EDWs are remarkably powerful, but it takes considerable expertise and creativity to modify them on the fly. Adding new capabilities to the EDW generally requires significant investments of time and money. You can develop your own tools internally or purchase them from a vendor, but either way, it’s a hard slog.

“You wind up with an endless proliferation of tools, each slightly different, designed to handle a specific type of query,” says Eiras. “Every time you have a different question, you have to use a different tool. It isn’t a good situation.”

In today’s ultra-competitive markets, business operates far too quickly for traditional software development timetables, and most companies are rightfully wary of relying on outside vendors to solve ongoing business challenges. Luckily, agile and other “lean” software development methods offer repeatable processes for harnessing creativity and expertise, at a pace that’s swift enough to keep the business happy.

Achieving concrete results faster

Unlike traditional ITIL-based software development life cycle (SDLC) or “waterfall” methods, agile does not begin with a finely detailed compendium of abstract requirements. Instead of striving for perfection, agile aims for the minimum viable product (MVP), which means that agile produces concrete results faster than traditional development methods.

The difference between the “abstractedness” of traditional methods and the “concreteness” of agile might seem like a matter of semantics, but it’s highly relevant in a world of hyper-turbulent markets and fickle customers. Abstractions are tolerable when you have plenty of time and money, but when time and money are tight, you need concrete results in a hurry. The whole point of agile is finding out quickly whether your idea works or not. Agile embodies the concept of “failing fast,” a phrase that sums up the ethos of modern innovation.

But here’s the rub: adopting agile methodologies is not the same as being agile. Agile is both a process and a mindset. It embraces both discipline and flexibility. Agile is not some kind of free-form anarchy; it’s a structured way of producing usable software without having a pre-written script, like improvisational theatre.

The first rule of improv is always to say, “Yes, and…” Everything an actor does on stage is complementary; nothing is exclusionary. New information emerges unexpectedly, and continual change is a given, but established premises remain in place.

Cultural challenges

“The biggest challenge is cultural,” says Oliver Ratzesberger, senior vice president of software at Teradata Labs. “Executives tell me they want agility, and then in the next sentence they ask me for a project plan, a roadmap, and a product release date. They switch back to waterfall without even realizing it.”

For modern software executives like Ratzesberger, explaining the difference between agile and “agility” is not an abstract dilemma. “Agility, especially in the context of big data, takes a lot of effort and really hard work. Creating an agile environment doesn’t mean turning off governance, eliminating documentation, and giving developers a sandbox to work in. That’s not agility — that’s the Wild West,” says Ratzesberger.

The problem with taking a Wild West approach to software development is that, “you end up with results that are not reproducible, which very quickly erodes confidence in your ability to handle the data,” he says. If the business executives who depend on the EDW don’t trust you with their data, they will look elsewhere for answers to their questions.

“Without methodology, agile quickly runs away on you,” says Ratzesberger. “You need to train people, and you need to change the culture of your organization. If you’re applying agile to big data, you need to work on the agile piece first, and then overlay the big data on top of it. You can’t start with big data and then try to make it agile. Then it becomes the Wild West, and you’re in trouble.”

Train the business, too

Explaining agile to people working in the business units is also important, says Ratzesberger. “You need to train the business. If you only train the IT people, the people in the business units won’t understand what you’re doing. They will assume they need to submit a requirements document, which they dread, and then there will be no agility between IT and the business.” He suggests co-locating technology and business people to improve communication and collaboration during agile projects.

Agile is not necessarily the answer to every software development challenge, says Ratzesberger. “There are certain tasks where accuracy is paramount. In production, for example, you’re not going to upgrade a critical process using agile. If it’s a customer-facing process that the company depends on for revenue, then agile might not be the best path.”

Agile is optimally suited for situations in which the ability to generate lightning-fast results can produce genuine competitive advantages. Experimenting with subsets of customer data, testing new product categories, evaluating the usability of a web page design, gauging the appeal of a smartphone app — those are the best scenarios for getting the most value from agile.

Complementary technologies

Jim Tosone, former director and team lead of the Healthcare Informatics Group at Pfizer Pharmaceuticals, uses the principles and processes of improvisation in his leadership development practice to help clients improve business performance.

Tosone sees agile, big data, and the EDW as natural partners. “Big data technology such as Hadoop, which is based on open-source code, is inherently agile,” he says.

“Now that we’re seeing more SQL-on-Hadoop integrations, we have a global community of open-source developers working to solve EDW challenges. That phenomenon is likely to accelerate,” says Tosone. “The relationships between agile, big data, and EDW are complimentary rather than antagonistic.”

Where to begin

Companies looking for ways to bring agile into their EDW operations should begin with simple steps. Tosone recommends outlining key use cases (e.g., hypothesis generation, hypothesis testing, business plan execution) and forming developer teams with diverse backgrounds. Diversity will assure that a broad range of possible solutions are considered.

Then map the potential solutions to each use case. Look for solutions that will enable users to run queries and analyses across multiple platforms. Favor solutions that can be modified, iterated, and replaced easily.

Tosone agrees with Ratzesberger that changing the culture is critical to success. “You need to educate and train people, and help them feel comfortable working across multiple functions and disciplines,” he says. “True agility requires a mindset that perceives change as a positive instead of a negative. You need a bias toward exploration and patience to resist the desire for closure.”

Hybrid environments

It seems clear that agile offers a reasonably quick and cost-effective way to bring flexibility to the EDW and to integrate newer open-source technologies, such as Hadoop, with existing database systems. Flexibility translates into greater speed and cost savings; integration promises a significantly wider range of analytic capabilities, which creates more opportunities for the business to pursue.

As we move forward into an era of bigger, faster, and more effective data analytics, it seems logical to assume that database systems architectures will include both traditional and open-source components. In hybrid computing environments, the real challenge is optimizing the relationships between all of the various people, processes, and technologies required to get the job done quickly and efficiently.

Instead of picking fights over platforms, we should search for mutually beneficial scenarios in which we use available resources to help businesses gain advantages in competitive markets. Or, as my friends in improvisational theater would say: “Yes, and…”

This article is part of a collaboration between O’Reilly and Teradata. See our statement of editorial independence.

tags: , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.