- Rise of the Independents (Bryce Roberts) — companies that don’t take VC money and instead choose to grow organically: indies. +1 for having a word for this.
- The Performance Golden Rule (Steve Souders) — 80-90% of the end-user response time is spent on the frontend. Check out his graphs showing where load times come from for various popular sites. The backend responds quickly, but loading all the Javascript and images and CSS and embedded autoplaying videos and all that kerfuffle takes much much longer.
- Starry Night Comes to Life — wow, beautiful, must-see.
- MapReduce Patterns, Algorithms, and Use Cases — In this article I digest a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found in the web or scientific articles. Several practical case studies are also provided. All descriptions and code snippets use the standard Hadoop’s MapReduce model with Mappers, Reduces, Combiners, Partitioners, and sorting.
ENTRIES TAGGED "Hadoop"
Strata Week: Big data boom and big data gaps
One report says the Hadoop market is booming while another says federal data usage isn't.
In this week's big data news, an IDC report points to the booming market for Hadoop and MapReduce (and if proposals for Strata are any indication, this is indeed a good time for big data).
Microsoft opens up
How Microsoft is contributing to and benefitting from open source.
Microsoft seems to be embracing open source more and more. What does this tell us about the company's near-term future?
Now available: "Planning for Big Data"
A free handbook for anybody wanting to understand and use big data.
"Planning for Big Data" is a new book that helps you understand what big data is, why it matters, and where to get started.
O'Reilly Radar Show 3/12/12: Best data interviews from Strata California 2012
Doug Cutting on Hadoop, Max Gadney on video data graphics, Jeremy Howard on big data and analytics.
Hadoop creator Doug Cutting discussing the similarities between Linux and the big data world, Max Gadney from After the Flood explains the benefits of video data graphics, Kaggle's Jeremy Howard looks at the difference between big data and analytics.
Four short links: 13 February 2012
Indie Businesses, Frontend Sluggards, Beautiful Graphics, and Big Data Patterns
Four short links: 8 February 2012
Text Mining, Unstoppable Sociality, Unicode Fun, and Scholarly Publishing
- Mavuno — an open source, modular, scalable text mining toolkit built upon Hadoop. (Apache-licensed)
- Cow Clicker — Wired profile of Cowclicker creator Ian Bogost. I was impressed by Cow Clickers [...] have turned what was intended to be a vapid experience into a source of camaraderie and creativity. People create communities around social activities, even when they are antisocial. (via BoingBoing)
- Unicode Has a Pile of Poo Character (BoingBoing) — this is perfect.
- The Research Works Act and the Breakdown of Mutual Incomprehension (Cameron Neylon) — an excellent summary of how researchers and publishers view each other and their place in the world.
What is Apache Hadoop?
A look at the components and functions of the Hadoop ecosystem.
Apache Hadoop has been the driving force behind the growth of the big data industry. But what does it do, and why do you need all its strangely-named friends, such as Oozie, Zookeeper and Flume?
Why Hadoop caught on
Doug Cutting on Hadoop's rise and why he's surprised at its growth.
Doug Cutting discusses Hadoop's current and near-term role, and the factors that made it a central part of data processing.
Radar
Radar on
Radar on
Radar on
Radar on 