In-memory data storage, SQL, data preparation and asking the right questions all emerged as key trends at Strata + Hadoop World.
At our successful Strata + Hadoop World conference (including successfully avoiding Sandy), a few themes emerged that resonated with my interests and experience as a hands-on data analyst and as a researcher who tracks technology adoption trends. Keep in mind that these themes reflect my personal biases. Others will have a different take on their own key takeaways from the conference.
1. In-memory data storage for faster queries and visualization
Interactive or real-time query for large datasets is seen as a key to analyst productivity (real-time as in query times fast enough to keep the user in the flow of analysis, from sub-second to less than a few minutes). The existing large-scale data management schemes aren’t fast enough and reduce analytical effectiveness when users can’t explore the data by quickly iterating through various query schemes. We see companies with large data stores building out their own in-memory tools, e.g., Dremel at Google, Druid at Metamarkets, and Sting at Netflix, and new tools, like Cloudera’s Impala announcement at the conference, UC Berkeley’s AMPLab’s Spark, SAP Hana, and Platfora.
We saw this coming a few years ago when analysts we pay attention to started building their own in-memory data store sandboxes, often in key/value data management tools like Redis, when trying to make sense of new, large-scale data stores. I know from my own work that there’s no better way to explore a new or unstructured data set than to be able to quickly run off a series of iterative queries, each informed by the last. Read more…
What does winning look like? No enemy has been vanquished, but open source is now mainstream and a new norm.
I heard the comments a few times at the 14th OSCON: The conference has lost its edge. The comments resonated with my own experience — a shift in demeanor, a more purposeful, optimistic attitude, less itching for a fight. Yes, the conference has lost its edge, it doesn’t need one anymore.
Open source won. It’s not that an enemy has been vanquished or that proprietary software is dead, there’s not much regarding adopting open source to argue about anymore. After more than a decade of the low-cost, lean startup culture successfully developing on open source tools, it’s clearly a legitimate, mainstream option for technology tools and innovation.
And open source is not just for hackers and startups. A new class of innovative, widely adopted technologies has emerged from the open source culture of collaboration and sharing — turning the old model of replicating proprietary software as open source projects on its head. Think Git, D3, Storm, Node.js, Rails, Mongo, Mesos or Spark.
We see more enterprise and government folks intermingling with the stalwart open source crowd who have been attending OSCON for years. And, these large organizations are actively adopting many of the open source technologies we track, e.g., web development frameworks, programming languages, content management, data management and analysis tools.
We hear fewer concerns about support or needing geek-level technical competency to get started with open source. In the Small and Medium Business (SMB) market we see mass adoption of open source for content management and ecommerce applications — even for self-identified technology newbies.
Agility, simplicity, and curiosity will define the next generation of apps and devices.
The speakers at the recent Webstock conference in New Zealand gravitated toward many of the same themes. Taken together, these themes create a framework for building the next generation of services, applications and devices.
How a days-long data process was completed in minutes.
We recently faced the type of big data challenge we expect to become increasingly common: scaling up the performance of a machine learning classifier for a large set of unstructured data. In this post, we explain how a set-oriented approach led to huge performance gains.
Lessons from adjacent media companies can inform publishers' ebook strategies.
Will internal constituencies bias how publishers value print book and ebook business models? Roger Magoulas examines that question and looks at the complementary relationship between print and electronic forms.