Raffael Marty

Raffael Marty is one of the world's most recognized authorities on security data analytics and visualization. Raffy is the founder and CEO of pixlcloud, a next generation visual analytics platform. With a 15 year track record in the big data and log analysis space, at companies like Loggly, Splunk, and ArcSight, he has helped numerous Fortune 500 companies defend against sophisticated adversaries. Raffy is a Zen student and practices the tradition of koan study to gain insight into life.

How to implement a security data lake

Practical tips for centralizing security data.

Information security has been dealing with terabytes of data for more than a decade — almost two. The benefits of having more data available spans many use cases, from forensic investigations to pro-actively finding anomalies and stopping adversaries before they cause harm.

But let’s be realistic. You probably have numerous repositories for your security data. Your Security Information and Event Management (SIEM) solution doesn’t scale to the volumes of data that you would really like to collect. This, in turn, makes it hard to use all of your data for any kind of analytics. It’s likely that your tools have to operate on multiple, disconnected data stores that have very different capabilities for data access and analysis. Even worse, during an incident, how many different consoles do you have to touch before you get the complete picture of what has happened? I would guess probably at least four (I would have said 42, but that seemed a bit excessive).

When talking to your peers about this problem, do they tell you to implement Hadoop to deal with the huge data volumes? But what does that really mean — is Hadoop really the solution? After all, Hadoop is a pretty complex ecosystem of tools that requires skilled and expensive people to implement and maintain. Read more…