At Strata + Hadoop World 2015 in San Jose last week, we ran an event for data-driven startups. This is the fourth year for the Startup Showcase, and it’s become a fixture of the conference. One of our early winners, MemSQL, has since raised $50 million in financing, and it’s a good way for companies to get visibility with investors, analysts, and attendees.
This year’s winners underscore several important trends in the big data space at the moment: the maturity of management tools; the deployment of machine learning in other verticals; an increased focus on privacy and permissions; and the convergence of enterprise languages like SQL with distributed, schema-less data stacks.
Third place went to Unravel, which improves the reliability, performance and utilization of Hadoop applications and clusters. As data systems have become increasingly complex, the efficiency of those systems has been under siege; no one person knows the whole stack, and abstraction layers designed to simplify eventually become costly in terms of processing. This happened in networking and cloud computing, and now it’s happening in big data.
Second place went to Caspida, which finds hidden threats using behavior-based machine learning algorithms. Computer vulnerabilities are also increasingly complex — we’re far up the stack from TCP/IP, and exploits often combine a variety of social, logical, and brute force approaches. As a result, security monitoring relies heavily on heuristics, and tools that can learn what abnormality looks like are a first line of defense.
The audience choice was Blue Talon, which ensures fine-grained control around who has access to what data. Now that NoSQL approaches to data have moved beyond search engines and product recommendations into enterprise environments, access control is a “table stakes” feature. That means control over encryption, deletion, recovery, and eventually even things like billing and cost control.
And our judges’ first place winner was Snowflake, a SQL data warehouse built as an elastic cloud service that processes semi-structured and structured data in one system without transformation or fixed schemas. Once, computing resources were costly, so safeguarding those resources was an implicit design constraint. But cloud computing gives us a nearly limitless number of inexpensive machine instances, changing many of the underlying constraints that led to the design of traditional data warehouses.
Ultimately, this year’s showcase was a reflection of the maturity and enterprise readiness we’re seeing in the industry. Absent from the winners were real-time technologies, or companies tackling machine data and the Internet of Things — even though these were hot topics in the halls and sessions of the event.