- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (PDF) — Berkeley research paper behind Apache Spark. (via Nelson Minar)
- Angular Tour — trivially add tour tips (“This is the widget basket, drag and drop for widget goodness!” type of thing) to your Angular app.
- Punchcard — generate Github-style punch card charts “with ease”.
- Where Credit Belongs for Hack (Bryan O’Sullivan) — public credit for individual contributors in a piece of corporate open source is a sign of confidence in your team, that building their public reputation isn’t going to result in them leaving for one of the many job offers they’ll receive. And, of course, of caring for your individual contributors. Kudos Facebook.
ENTRIES TAGGED "Big Data"
Ignore the hype. Learn to be a data skeptic.
Yawn. Yet another article trashing “big data,” this time an op-ed in the Times. This one is better than most, and ends with the truism that data isn’t a silver bullet. It certainly isn’t.
I’ll spare you all the links (most of which are much less insightful than the Times piece), but the backlash against “big data” is clearly in full swing. I wrote about this more than a year ago, in my piece on data skepticism: data is heading into the trough of a hype curve, driven by overly aggressive marketing, promises that can’t be kept, and spurious claims that, if you have enough data, correlation is as good as causation. It isn’t; it never was; it never will be. The paradox of data is that the more data you have, the more spurious correlations will show up. Good data scientists understand that. Poor ones don’t.
It’s very easy to say that “big data is dead” while you’re using Google Maps to navigate downtown Boston. It’s easy to say that “big data is dead” while Google Now or Siri is telling you that you need to leave 20 minutes early for an appointment because of traffic. And it’s easy to say that “big data is dead” while you’re using Google, or Bing, or DuckDuckGo to find material to help you write an article claiming that big data is dead. Read more…
HBase has made inroads in companies across many industries and countries
With HBaseCon right around the corner, I wanted to take stock of one of the more popular1 components in the Hadoop ecosystem. Over the last few years, many more companies have come to rely on HBase to run key products and services. The conference will showcase a wide variety of such examples, and highlight some of the new features that HBase developers have added over the past year. In the meantime here are some things2 you may not have known about HBase:
Many companies have had HBase in production for 3+ years: Large technology companies including Trend Micro, EBay, Yahoo! and Facebook, and analytics companies RocketFuel and Flurry depend on HBase for many mission-critical services.
There are many use cases beyond advertising: Examples include communications (Facebook messages, Xiaomi), security (Trend Micro), measurement (Nielsen), enterprise collaboration (Jive Software), digital media (OCLC), DNA matching (Ancestry.com), and machine data analysis (Box.com). In particular Nielsen uses HBase to track media consumption patterns and trends, mobile handset company Xiaomi uses Hbase for messaging and other consumer mobile services, and OCLC runs the world’s largest online database of library resources on HBase.
Flurry has the largest contiguous HBase cluster: Mobile analytics company Flurry has an HBase cluster with 1,200 nodes (replicating into another 1,200 node cluster). Flurry is planning to significantly expand their large HBase cluster in the near future.
Establishing protocols to socialize wearable devices.
The age of ubiquitous computing is accelerating, and it’s creating some interesting social turbulence, particularly where wearable hardware is concerned. Intelligent devices other than phones and screens — smart headsets, glasses, watches, bracelets — are insinuating themselves into our daily lives. The technology for even less intrusive mechanisms, such as jewelry, buttons, and implants, exists and will ultimately find commercial applications.
And as sensor-and-software-augmented devices and wireless connections proliferate through the environment, it will be increasingly difficult to determine who is connected — and how deeply — and how the data each of us generates is disseminated, captured and employed. We’re already seeing some early signs of wearable angst: recent confrontations in bars and restaurants between those wearing Google Glass and others worried they were being recorded.
This is nothing new, of course. Many major technological developments experienced their share of turbulent transitions. Ultimately, though, the benefits of wearable computers and a connected environment are likely to prove too seductive to resist. People will participate and tolerate because the upside outweighs the downside. Read more…
Data tools are less important than the way you frame your questions.
Max Shron and Jake Porway spoke with me at Strata a few weeks ago about frameworks for making reasoned arguments with data. Max’s recent O’Reilly book, Thinking with Data, outlines the crucial process of developing good questions and creating a plan to answer them. Jake’s nonprofit, DataKind, connects data scientists with worthy causes where they can apply their skills.
A few of the things we talked about:
- The importance of publishing negative scientific results
- Give Directly, an organization that facilitates donations directly to households in Kenya and Uganda. Give Directly was able to model income using satellite data to distinguish thatched roofs from metal roofs.
- Moritz Stefaner calling for a “macroscope”
- Project Cybersyn, Salvador Allende’s plan for encompassing the entire Chilean economy in a single real-time computer system
- Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed by James C. Scott
After we recorded this podcast episode at Strata Santa Clara, Max presided over a webcast on his book that’s archived here.
Such lists might mean we miss the truly great breakthroughs, inspirations, and leaps of faith necessary to evolve.
Editor’s note: this post originally appeared on Tilt the Windmill; it is republished here with permission.
First: it’s an excellent post. You should read it. I’ll wait.
Every enterprise decision-maker will soon be running their business according to the lists Barry envisions, as the power of big data and analytics finds its way into every boardroom and dashboard. Society will soon demand them, too. But while such analysis is tremendously valuable, it carries two dangers: the politics of setting criteria, and the trap of relying on data for inspiration.
The harsh light of data
Barry is right: rather than using our precious time and resources to make yet another linkbait list of the 50 cutest kittens, or the seven people I’ll try to avoid at SXSW, we should use abundant data and a connected world to build lists that matter: lying politicians, bad cars, lousy doctors. Then we can use these lists to change policy and behaviour because we’ll make things transparent. Shining the harsh light of data on something can improve it.