Analytic engines that factor in security labels

Data stores are rolling out easy-to-use analysis tools

Originated by the NSA, Apache Accumulo is a BigTable inspired data store known for being highly scalable and for its interesting security model. Federal agencies and Defense contractors have deployed Accumulo on clusters of a thousand or more servers. It also uses “cell-level” security to control access to values stored in individual cells1.

What Accumulo was lacking were easy-to-use, standard analytic engines that allow users to interact with data. The release of Sqrrl Enterprise this past week fills that gap. Sqrrl Enterprise provides an initial set of analytic engines for the Accumulo ecosystem2. It includes support for interactive SQL, fulltext search, and queries over graph data. Each of these engines takes into account security labels placed on data: since every data object ingested into Sqrrl has a security label, (query & analytic) results incorporate those access levels. Analysts interact with data as they normally would. For example Sqrrl’s indexing technology accounts for security labels, and search queries are written in standard Lucene syntax. Reminiscent of the Phoenix project for HBase3, SQL queries4 in Sqrrl are converted into optimized Accumulo iterators.

As I’ve pointed out in recent posts, analytic engines are the natural next step after building a scale-out data store with batch processing capability. Application frameworks like Kiji can then leverage those engines to simplify the app development process. Sqrrl is building analytic capabilities without sacrificing Accumulo’s unique5 security model. It certainly seems like a natural fit for industries (like health care) where privacy is central. I’m just glad that data stores of all stripes are rolling out these basic engines in earnest.

Related posts:

 


(1) In contrast, many data stores can only restrict what columns or rows users can access.
(2) Sqrrl Enterprise is a commercial product built on top of Accumulo. So the analytic tools described above aren’t technically freely available to users of Apache Accumulo. I think the HBase ecosystem has an advantage in this regard: their tools are available to all HBase users.
(3) Phoenix turns SQL into optimized native HBase calls.
(4) The current version of Sqrrl Enterprise does not yet support “joins” or “subselects”.
(5) Until other data stores implement some version of cell-level security, Accumulo has a distinguishing feature.

O’Reilly Strata Conference — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.Strata Rx Health Data Conference: September 25-27 | Boston, MA
Strata + Hadoop World: October 28-30 | New York, NY
Strata in London: November 15-17 | London, England

tags: , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.