"data lake" entries
Practical tips for centralizing security data.
But let’s be realistic. You probably have numerous repositories for your security data. Your Security Information and Event Management (SIEM) solution doesn’t scale to the volumes of data that you would really like to collect. This, in turn, makes it hard to use all of your data for any kind of analytics. It’s likely that your tools have to operate on multiple, disconnected data stores that have very different capabilities for data access and analysis. Even worse, during an incident, how many different consoles do you have to touch before you get the complete picture of what has happened? I would guess probably at least four (I would have said 42, but that seemed a bit excessive).
When talking to your peers about this problem, do they tell you to implement Hadoop to deal with the huge data volumes? But what does that really mean — is Hadoop really the solution? After all, Hadoop is a pretty complex ecosystem of tools that requires skilled and expensive people to implement and maintain. Read more…
In this O'Reilly Radar Podcast: Edd Dumbill on the data lake, and Rajiv Maheswaran on the science of moving dots.
In a recent blog post, Edd Dumbill, VP of strategy at Silicon Valley Data Science, wrote about the phrase “data lake.” Likening it to a dream, he described a data lake as “a place with data-centered architecture, where silos are minimized, and processing happens with little friction in a scalable, distributed environment…Data itself is no longer restrained by initial schema decisions, and can be exploited more freely by the enterprise.” He explained that he called it a “dream” because “we’ve a way to go to make the vision come true” — but noted he’s optimistic the dream can be realized.