- Maximum Happy Imagination (Matt Jones) — questioning the true vision of Marc Andreessen’s recent Twitter discourse on the great future that awaits us. His analogies run out in the 20th century when it comes to the political, social and economic implications of his maximum happy imagination.
- The Mirrortocracy — It’s astonishing how many of the people conducting interviews and passing judgement on the careers of candidates have had no training at all on how to do it well. Aside from their own interviews, they may not have ever seen one. I’m all for learning on your own but at least when you write a program wrong it breaks. Without a natural feedback loop, interviewing mostly runs on myth and survivor bias.
- Longitude Prize — six prize areas, Grand Challenge style, in clean flight, antibiotic resistance, dementia, food, water, and overcoming paralysis. Mysteriously none for library system that avoids DLL hell.
- The Re-Emergence of Datalog — Michael Fogus overviews Datalog and provides examples of how it is implemented and used in Datomic, Cascalog, and the Bacwn Clojure library. See also notes from the talk.
"Big Data" entries
Many more companies want to highlight how they're using Apache Spark in production.
One of the trends we’re following closely at Strata is the emergence of vertical applications. As components for creating large-scale data infrastructures enter their early stages of maturation, companies are focusing on solving data problems in specific industries rather than building tools from scratch. Virtually all of these components are open source and have contributors across many companies. Organizations are also sharing best practices for building big data applications, through blog posts, white papers, and presentations at conferences like Strata.
These trends are particularly apparent in a set of technologies that originated from UC Berkeley’s AMPLab: the number of companies that are using (or plan to use) Spark in production1 has exploded over the last year. The surge in popularity of the Apache Spark ecosystem stems from the maturation of its individual open source components and the growing community of users. The tight integration of high-performance tools that address different problems and workloads, coupled with a simple programming interface (in Python, Java, Scala), make Spark one of the most popular projects in big data. The charts below show the amount of active development in Spark:
For the second year in a row, I’ve had the privilege of serving on the program committee for the Spark Summit. I’d like to highlight a few areas where Apache Spark is making inroads. I’ll focus on proposals2 from companies building applications on top of Spark.
Agile methodology brings flexibility to the EDW and offers ways to integrate open-source technologies with existing systems.
Data analysis, like other pursuits, is a balancing act. The rise of big data ratchets up the pressure on the traditional enterprise data warehouse (EDW) and associated software tools to handle rapidly evolving sets of new demands posed by the business. Companies want their EDW systems to be more flexible and more user friendly — without sacrificing processing speeds, data integrity, or overall reliability.
“The more data you give the business, the more questions they will ask,” says José Carlos Eiras, who has served as CIO at Kraft Foods, Philip Morris, General Motors, and DHL. “When you have big data, you have a lot of different questions, and suddenly you need an enterprise data warehouse that is very flexible.”
EDWs are remarkably powerful, but it takes considerable expertise and creativity to modify them on the fly. Adding new capabilities to the EDW generally requires significant investments of time and money. You can develop your own tools internally or purchase them from a vendor, but either way, it’s a hard slog. Read more…