- Accumulo — NSA’s BigTable implementation, released as an Apache project.
- How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting access to the technology that the traders had: Last fall the SEC said it would pay Tradeworx, a high-frequency trading firm, $2.5 million to use its data collection system as the basic platform for a new surveillance operation. Code-named Midas (Market Information Data Analytics System), it scours the market for data from all 13 public exchanges. Midas went live in February. The SEC can now detect anomalous situations in the market, such as a trader spamming an exchange with thousands of fake orders, before they show up on blogs like Nanex and ZeroHedge. If Midas sees something odd, Berman’s team can look at trading data on a deeper level, millisecond by millisecond.
- PRISM: Surprised? (Danny O’Brien) — I really don’t agree with the people who think “We don’t have the collective will”, as though there’s some magical way things got done in the past when everyone was in accord and surprised all the time. It’s always hard work to change the world. Endless, dull hard work. Ten years later, when you’ve freed the slaves or beat the Nazis everyone is like “WHY CAN’T IT BE AS EASY TO CHANGE THIS AS THAT WAS, BACK IN THE GOOD OLD DAYS. I GUESS WE’RE ALL JUST SHEEPLE THESE DAYS.”
- What We Don’t Know About Spying on Citizens is Scarier Than What We Do Know (Bruce Schneier) — The U.S. government is on a secrecy binge. It overclassifies more information than ever. And we learn, again and again, that our government regularly classifies things not because they need to be secret, but because their release would be embarrassing. Open source BigTable implementation: free. Data gathering operation around it: $20M/year. Irony in having the extent of authoritarian Big Brother government secrecy questioned just as a whistleblower’s military trial is held “off the record”: priceless.
Learn to resist vanity metrics
One of the things we preach in Lean Analytics is that entrepreneurs should avoid vanity metrics—numbers that make you feel good, but ultimately, don’t change your behavior. Vanity metrics (such as “total visitors”) tend to go “up and to the right” but don’t tell you much about how you’re doing.
Many people find solace in graphs that go up and to the right. The metric “Total number of people who have visited my restaurant” will always increase; but on its own it doesn’t tell you anything about the health of the business. It’s just head-in-the-sand comforting.
A good metric is often a comparative rate or ratio. Consider what happens when you put the word “per” before or after a metric. “Restaurant visitors per day” is vastly more meaningful. Time is the universal denominator, since the universe moves inexorably forwards. But there are plenty of other good ratios. For example, “revenue per restaurant visitor” matters a lot, since it tells you what each diner contributes.
What’s an active user, anyway?
For many businesses, the go-to metric revolves around “active users.” In a mobile app or software-as-a-service business, only some percentage of people are actively engaged. In a media site, only some percentage uses the site each day. And in a loyalty-focused e-commerce company, only some buyers are active.
This is true of more traditional businesses, too. Only a percentage of citizens are actively engaged in local government; only a certain number of employees are using the Intranet; only a percentage of coffee shop patrons return daily.
Unfortunately, saying “measure active users” begs the question: What’s active, anyway?
To figure this out, you need to look at your business model. Not your business plan, which is a hypothetical projection of how you’ll fare, but your business model. If you’re running a lemonade stand, your business model likely has a few key assumptions:
- The cost of lemonade;
- The amount of foot traffic past your stand;
- The percent of passers-by who will buy from you;
- The price they are willing to pay.
Our Lean lemonade stand would then set about testing and improving each metric, running experiments to find the best street corner, or determine the optimal price.
Lemonade stands are wonderfully simple, so your business may have many other assumptions, but it is essential that you quantify them and state them so you can then focus on improving them, one by one, until your business model and reality align. In a restaurant, for example, these assumptions might be, “we will have at least 50 diners a day” or “diners will spend on average $20 a meal.”
The activity you want changes
We believe most new companies and products go through five distinct stages of growth:
- Empathy, where you figure out what problem you’re solving and what solution people want;
- Stickiness, where you measure how many people adopt your solution rather than trying it and leaving;
- Virality, where you maximize word-of-mouth and references;
- Revenue, where you pour some part of your revenues back into paid acquisition or advertising;
- Scale, where you grow the business through automation, delegation, and process.
OSCON 2013 Speaker Series
Andy Gup (@agup) is a Developer Evangelist at ESRI and OSCON 2013 Speaker. In this interview we talk about location capabilities in apps as well as location analytics.
NOTE: If you are interested in attending OSCON to check out Andy’s talk or the many other cool sessions, click over to the OSCON website where you can use the discount code OS13PROG to get 20% your registration fee.
Key highlights include:
- Mobile apps must have location capabilities [Discussed at 0:25]
- Consider your goals when incorporating location into an app [Discussed at 1:05]
- Is it difficult to add location functionality? [Discussed at 2:19]
- A real-world example of where it made a big difference [Discussed at 3:32]
- Location analytics are very powerful [Discussed at 5:11]
- Augmented reality and location capabilities [Discussed at 6:38]
You can view the full interview here:
Open Source BigTable, Robots Lost, Changing the World, Secrecy Binge
Machine Learning Demos, iOS Debugging, Industrial Internet, and Deanonymity
- MLDemos — an open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. (via Mark Alen)
- kiln (GitHub) — open source extensible on-device debugging framework for iOS apps.
- Industrial Internet — the O’Reilly report on the industrial Internet of things is out. Prasad suggests an illustration: for every car with a rain sensor today, there are more than 10 that don’t have one. Instead of an optical sensor that turns on windshield wipers when it sees water, imagine the human in the car as a sensor — probably somewhat more discerning than the optical sensor in knowing what wiper setting is appropriate. A car could broadcast its wiper setting, along with its location, to the cloud. “Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction.
- Unique in the Crowd: The Privacy Bounds of Human Mobility (PDF, Nature) — We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals. As Edd observed, “You are a unique snowflake, after all.” (via Alasdair Allan)
- A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
- Easy6502 — get started writing 6502 assembly language. Fun way to get started with low-level coding.
- How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
Barlow's distilled insights regarding the ever evolving definition of real time big data analytics
During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?”
“Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without saying anything aloud. He’s really good at that.
In my utterly meager defense, Mike Loukides is the editor on Mike Barlow’s Real-Time Big Data Analytics: Emerging Architecture. As Loukides is one of the core drivers behind O’Reilly’s book publishing program and someone who I perceive to be an unofficial boss of my own choosing, I am not really inclined to worry about things that I really don’t need to worry about. Then I started getting not-so-subtle inquiries from additional people asking if I would consider reviewing the manuscript for the Strata community site. This resulted in me emailing Loukides for a copy and sitting in a local cafe on a Sunday afternoon to read through the manuscript.
Tips for interacting with analytics colleagues
To quote Pride and Prejudice, businesses have for many years “labored under the misapprehension” that their analytics talent was made up of misanthropes with neither the will nor the ability to communicate or work with others on strategic or creative business problems. These employees were meant to be kept in the basement out of sight, fed bad pizza, and pumped for spreadsheets to be interpreted in the sunny offices aboveground.
This perception is changing in industry as the big data phenomenon has elevated data science to a C-level priority. Suddenly folks once stereotyped by characters like Milton in Office Space are now “sexy.” The truth is there have always been well-rounded, articulate, friendly analytics professionals (they may just like Battlestar more than you), and now that analytics is an essential business function, personalities of all types are being attracted to practice the discipline.
Malware Industrial Complex, Indies Needed, TV Analytics, and HTTP Benchmarking
- Welcome to the Malware-Industrial Complex (MIT) — brilliant phrase, sound analysis.
- Stupid Stupid xBox — The hardcore/soft-tv transition and any lead they feel they have is simply not defensible by licensing other industries’ generic video or music content because those industries will gladly sell and license the same content to all other players. A single custom studio of 150 employees also can not generate enough content to defensibly satisfy 76M+ customers. Only with quality primary software content from thousands of independent developers can you defend the brand and the product. Only by making the user experience simple, quick, and seamless can you defend the brand and the product. Never seen a better put statement of why an ecosystem of indies is essential.
- Data Feedback Loops for TV (Salon) — Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.
- wrk — a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.
Handmade Hardware, Tab Silencer, Surprise and Models, and Sciencey GIFs
- Your USB Sticks Are Made With Chopsticks (Bunnie Huang) — behind-the-scenes on how USB sticks are made.
- mutetab — find and kill the Chrome tab making all the damn noise! (via Nelson Minar)
- Visualization, Modeling, and Surprises (John D Cook) — paraphrases Hadley Wickham: Visualization can surprise you, but it doesn’t scale well. Modelling scales well, but it can’t surprise you.
- Head Like an Orange — science animated GIFs, assembled from nature documentaries. (via Ed Yong)
A deconstructed web analytics report shows what the dashboard missed.
We can all agree that in 2013 web analytics is still a nightmare, right?
The last few years have brought about an enormous expansion in the top of the web analytics information overload funnel, and today I can discover just about any aspect of my web traffic that piques my curiosity.
I know how much traffic I’m getting, who told them to come here, how they got here, how long they’re staying, what they’re looking at, what they’re using to look at it, where they’re from, and just about anything else I want to know about them. If I don’t like what I’m looking at, I can customize everything from my dashboard to reports to parameters within those reports.
What none of this tells me is how I can be more successful at turning the words I put on the Internet into dollars in my pocket.
Now, I know what you’re thinking: “It’s all there! More information than you could ever figure out what to do with.”
The problem with that is that it’s all there. It’s more information than I could ever figure out what to do with. Read more…