Fast data calls for new ways to manage its flow

Examples of multi-layer, three-tier data-processing architecture.

Storage_Servers_grover_net_Flickr

Like CPU caches, which tend to be arranged in multiple levels, modern organizations direct their data into different data stores under the principle that a small amount is needed for real-time decisions and the rest for long-range business decisions. This article looks at options for data storage, focusing on one that’s particularly appropriate for the “fast data” scenario described in a recent O’Reilly report.

Many organizations deal with data on at least three levels:

  1. They need data at their fingertips, rather like a reference book you leave on your desk. Organizations use such data for things like determining which ad to display on a web page, what kind of deal to offer a visitor to their website, or what email message to suppress as spam. They store such data in memory, often in key/value stores that allow fast lookups. Flash is a second layer (slower than memory, but much cheaper), as I described in a recent article. John Piekos, vice president of engineering at VoltDB, which makes an in-memory database, says that this type of data storage is used in situations where delays of just 20 or 30 milliseconds mean lost business.
  2. For business intelligence, theses organizations use a traditional relational database or a more modern “big data” tool such as Hadoop or Spark. Although the use of a relational database for background processing is generally called online analytic processing (OLAP), it is nowhere near as online as the previous data used over a period of just milliseconds for real-time decisions.
  3. Some data is archived with no immediate use in mind. It can be compressed and perhaps even stored on magnetic tape.

For the new fast data tier, where performance is critical, techniques such as materialized views further improve responsiveness. According to Piekos, materialized views bypass a certain amount of database processing to cut milliseconds off of queries. Read more…

Comment

Privacy is a concept, not a regime

In this O'Reilly Radar Podcast: Dr. Gilad Rosner talks about data privacy, and Alasdair Allan chats about the broken IoT.

In this podcast episode, I catch up with Dr. Gilad Rosner, a visiting researcher at the Horizon Digital Economy Research Institute in England. Rosner focuses on privacy, digital identity, and public policy, and is launching an Internet of Things Privacy Forum. We talk about personal data privacy in the age of the Internet of Things (IoT), privacy as a social characteristic, an emerging design ethos for technologists, and whether or not we actually own our personal data. Rosner characterizes personal data privacy as a social construct and addresses the notion that privacy is dead:

“Firstly, it’s important to recognize the idea that privacy is not a regime to control information. Privacy is a much larger concept than that. Regimes to control information are ways that we as a society preserve privacy, but privacy itself emerges from social needs and from individual human needs. The idea that privacy is dead comes from the vulnerability that people are feeling because they can see that it’s very difficult to maintain walls between their informational spheres, but that doesn’t mean that there aren’t countercurrents to that, and it doesn’t mean that there aren’t ways, as we go forward, to improve privacy preservation in the electronic spaces that we continue to move into.”

Subscribe to the O’Reilly Radar Podcast

iTunes, SoundCloud, RSS

As we move more and more into these electronic spaces and the Internet of Things becomes democratized, our notions of privacy are shifting on a cultural level beyond anything we’ve experienced as a society before. Read more…

Comment

Signals from Strata + Hadoop World in Barcelona 2014

From the Internet of Things to data-driven fashion, here are key insights from Strata + Hadoop World in Barcelona 2014.

Experts from across the big data world came together for Strata + Hadoop World in Barcelona 2014. We’ve gathered insights from the event below.

#IoTH: The Internet of Things and Humans

“If we could start over with these capabilities we have now, how would we do it differently?” Tim O’Reilly continues to explore data and the Internet of Things through the lens of human empowerment and the ability to “use technology to give people superpowers.”

Read more…

Comment: 1

The science of moving dots: the O’Reilly Data Show Podcast

Rajiv Maheswaran talks about the tools and techniques required to analyze new kinds of sports data.

Many data scientists are comfortable working with structured operational data and unstructured text. Newer techniques like deep learning have opened up data types like images, video, and audio.

Other common data sources are garnering attention. With the rise of mobile phones equipped with GPS, I’m meeting many more data scientists at start-ups and large companies who specialize in spatio-temporal pattern recognition. Analyzing “moving dots” requires specialized tools and techniques.

Subscribe to the O’Reilly Data Show Podcast

iTunes, SoundCloud, RSS

A few months ago, I sat down with Rajiv Maheswaran founder and CEO of Second Spectrum, a company that applies analytics to sports tracking data. Maheswaran talked about this new kind of data and the challenge of finding patterns:

“It’s interesting because it’s a new type of data problem. Everybody knows that big data machine learning has done a lot of stuff in structured data, in photos, in translation for language, but moving dots is a very new kind of data where you haven’t figured out the right feature set to be able to find patterns from. There’s no language of moving dots, at least not that computers understand. People understand it very well, but there’s no computational language of moving dots that are interacting. We wanted to build that up, mostly because data about moving dots is very, very new. It’s only in the last five years, between phones and GPS and new tracking technologies, that moving data has actually emerged.”

Read more…

Comments: 2

The intersection of data and design is equal parts art and science

Data-informed design is a framework to hone understanding of customer behavior and align teams with larger business goals.

Editor’s note: this is an excerpt from our forthcoming book Designing with Data; it is part of a free curated collection of chapters from the O’Reilly Design library — download a free copy of the Experience Design ebook here.

The phrase “data driven” has long been part of buzzword-bingo card sets. It’s been heard in the halls of the web analytics conference eMetrics for more than a decade, with countless sessions aimed at teaching audience members how to turn their organizations into data-driven businesses.

When spoken of in a positive light, the phrase data driven conjures visions of organizations with endless streams of silver-bullet reports — you know the ones: they’re generally entitled something to the effect of “This Chart Will Help Us Fix Everything” and show how a surprise change can lead to a quadrillion increase in revenue along with world peace.

When spoken of in a negative light, the term is thrown around as a descriptor of Orwellian organizations with panopticon-level data collection methods, with management imprisoned by relentless reporting, leaving no room for real innovation.

Evan Williams, founder of Blogger, Twitter, and Medium, made an apt comment about being data driven:

I see this mentality that I think is common, especially in Silicon Valley with engineer-driven start-ups who think they can test their way to success. They don’t acknowledge the dip. And with really hard problems, you don’t see market success right away. You have to be willing to go through the dark forest and believe that there’s something down there worth fighting the dragons for, because if you don’t, you’ll never do anything good. I think it’s kind of problematic how data-driven some companies are today, as crazy as that sounds.”

Read more…

Comment

The future of design: stay one step ahead of the algorithm

Future-proof yourself by ensuring the kind of work you do cannot be easily replicated by an algorithm.

Editor’s note: this post originally published on Medium; it is re-published here with permission.

Server_porn_Paul_Hammond_Flickr

It’s foolhardy to predict more than a few years into the future. Much unforeseen can happen between then and now. That being said, in 30 years, give or take 10 years, the discipline of design as it’s practiced today will be over. This isn’t anything new for design — it’s practiced very differently now than it was 30 years ago, or 30 years before that, and so on, stretching back decades, perhaps centuries. This also won’t be unique for design, as many fields of work will be utterly transformed in 30 years’ time. These changes will be drastic, and design will never be the same afterward.

The canary in the coal mine is Autodesk’s Project Dreamcatcher. Introduced by CEO Carl Bass at this year’s Solid conference, Dreamcatcher appears to work like this: industrial designers put together inspiration in the form of exemplars and combine them with requirements and constraints, then feed them all into Dreamcatcher. An algorithm then processes this information and spits out many possible designs. Designers can either start over with new or tweaked criteria, or continue by selecting a design to refine. It’s no stretch of the imagination to see this being done for digital objects as well. In fact, it might well be an easier task for digital design than for physical objects. Read more…

Comment