- Accumulo — NSA’s BigTable implementation, released as an Apache project.
- How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting access to the technology that the traders had: Last fall the SEC said it would pay Tradeworx, a high-frequency trading firm, $2.5 million to use its data collection system as the basic platform for a new surveillance operation. Code-named Midas (Market Information Data Analytics System), it scours the market for data from all 13 public exchanges. Midas went live in February. The SEC can now detect anomalous situations in the market, such as a trader spamming an exchange with thousands of fake orders, before they show up on blogs like Nanex and ZeroHedge. If Midas sees something odd, Berman’s team can look at trading data on a deeper level, millisecond by millisecond.
- PRISM: Surprised? (Danny O’Brien) — I really don’t agree with the people who think “We don’t have the collective will”, as though there’s some magical way things got done in the past when everyone was in accord and surprised all the time. It’s always hard work to change the world. Endless, dull hard work. Ten years later, when you’ve freed the slaves or beat the Nazis everyone is like “WHY CAN’T IT BE AS EASY TO CHANGE THIS AS THAT WAS, BACK IN THE GOOD OLD DAYS. I GUESS WE’RE ALL JUST SHEEPLE THESE DAYS.”
- What We Don’t Know About Spying on Citizens is Scarier Than What We Do Know (Bruce Schneier) — The U.S. government is on a secrecy binge. It overclassifies more information than ever. And we learn, again and again, that our government regularly classifies things not because they need to be secret, but because their release would be embarrassing. Open source BigTable implementation: free. Data gathering operation around it: $20M/year. Irony in having the extent of authoritarian Big Brother government secrecy questioned just as a whistleblower’s military trial is held “off the record”: priceless.
Business analytics projects: Using decisions as a basis to prioritize and identify requirements
Most normal people don’t look at data sets just for fun. They study views of the data to make decisions about what to do, be it a decision to take some specific action or a decision to do nothing at all. The main purpose of business analytics projects is to develop systems that turn large and often highly complex data sets into meaningful information from which decisions can be made.
The decisions that people make using business analytics systems can be strategic, operational, or tactical. For example, an executive might look at his sales team’s global performance dashboard to decide who to promote (tactical), which products need different marketing strategies (operational), or which products to target by markets (strategic). Generally speaking, all software systems that include an analytics component should enable users to make decisions that improve organizational performance in some dimension.
Modern data processing tools, many of them open source, allow more clinical studies at lower costs
This guest posting was written by Yadid Ayzenberg (@YadidAyzenberg on Twitter). Yadid is a PhD student in the Affective Computing Group at the MIT Media Lab. He has designed and implemented cloud platforms for the aggregation, processing and visualization of bio-physiological sensor data. Yadid will speak on this topic at the Strata Rx conference.
A few weeks ago, I learned that the Framingham Heart Study would lose $4 million (a full 40 percent of its funding) from the federal government due to automatic spending cuts. This seminal study, begun in 1948, set out to identify the contributing factors to Cardiovascular Disease (CVD) by following a group of 5,209 men and woman and tracking their life style habits, performing regular physical examinations and lab tests. This study was responsible for finding the major risk factors for CVD, such as high blood pressure and lack of exercise. The costs associated with such large-scale clinical studies are prohibitive, making them accessible only to organizations with sufficient financial resources or through government funding.
Long a development tool, TestFlightApp wants to move into analytics
For most iOS developers, TestFlightApp has become the go-to tool when they want to distribute a development build to testers. For those not familiar with the site, you can register applications, and then upload IPA files signed with either a development or AdHoc profile, either manually or using a desktop app that integrates directly into XCode.
Once uploaded, your testers can be automatically notified via email that there is a new version of the app available, and download it directly onto their device without having to use iTunes. It can even capture device IDs for new users (or new devices for existing users), and export them for use in the Apple developer portal.
You can also add code to have the running app check in with TestFlight. You can add “checkpoint” flags, ask survey questions (“why did you come to this page”), and have console logs and crash reports automatically uploaded to the site.
The problem is, once you’re ready to ship a production version, you have traditionally had to turn everything off and make sure that the Test Flight library was not linked in to the app. This has meant that there was no way to capture crash data from customers running the app. But now that’s changing.
Recently, TestFlightApp announced that it was now OK to leave the library in copies of your app uploaded to the App Store, and to have the app check in with TestFlight. Why the change? Probably because it is needed to support FlightPath, their new analytics tool. FlightPath seems to want to be the Google Analytics of mobile, allowing developers to see how customers use their app and offering demographic data.
FlightPath is likely to be the path that TestFlightApp takes to start monetizing their service. TestFlightApp has always been free, but there has been no pronouncement about whether FlightPath will follow that same model. It is currently in an open beta, so we may have to wait and see what the pricing model for the final product is. Of course, by then, all those beta users will have become hooked.
One major caution for people intending to keep TestFlight in their production code, watch out for leakage of private data! Many test builds spit out tons of information to the console. At times, I’ve had everything going back and forth to a server dumping itself onto the log. If you don’t disable that in the shipping code, you could be accidentally capturing all sorts of sensitive data, including credit cards, HIPPA restricted information, etc. So make sure that you have compiled out (or disabled) anything like that in the production build (which you can test with an AdHoc profile.)
A game changer for a marketer to pinpoint what a customer wants, when they want it, and how they want to hear about it
My 2 and a half year old daughter loves the Mickey Mouse Clubhouse. She watches episodes on TV and our iPad. She wears Minnie Mouse flip flops and giggles just about every time she sees anything with Mickey, Daisy, Goofy…you get the idea. And when she’s old enough to go to Disney World, Minnie might walk right up to her and say “Hi Jemma!” and give her a big hug.
Creating a personal interaction between a child and a beloved Disney character exemplifies the company’s recent initiative to deliver a personalized, hassle-free experience at their theme parks. 1 With the wireless tracking wristband ‘MagicBand,’ families are able to reserve spots in lines for popular attractions, purchase items at the parks, and unlock their hotel rooms. The MagicBand is part of the MyMagic+ system, which enables Disney to collect data on visitors’ purchasing habits and real-time location, among other things. Disney will use this vast trove of information to deliver a personalized experience at the parks and tailor marketing messages and promotions.
Learn to resist vanity metrics
One of the things we preach in Lean Analytics is that entrepreneurs should avoid vanity metrics—numbers that make you feel good, but ultimately, don’t change your behavior. Vanity metrics (such as “total visitors”) tend to go “up and to the right” but don’t tell you much about how you’re doing.
Many people find solace in graphs that go up and to the right. The metric “Total number of people who have visited my restaurant” will always increase; but on its own it doesn’t tell you anything about the health of the business. It’s just head-in-the-sand comforting.
A good metric is often a comparative rate or ratio. Consider what happens when you put the word “per” before or after a metric. “Restaurant visitors per day” is vastly more meaningful. Time is the universal denominator, since the universe moves inexorably forwards. But there are plenty of other good ratios. For example, “revenue per restaurant visitor” matters a lot, since it tells you what each diner contributes.
What’s an active user, anyway?
For many businesses, the go-to metric revolves around “active users.” In a mobile app or software-as-a-service business, only some percentage of people are actively engaged. In a media site, only some percentage uses the site each day. And in a loyalty-focused e-commerce company, only some buyers are active.
This is true of more traditional businesses, too. Only a percentage of citizens are actively engaged in local government; only a certain number of employees are using the Intranet; only a percentage of coffee shop patrons return daily.
Unfortunately, saying “measure active users” begs the question: What’s active, anyway?
To figure this out, you need to look at your business model. Not your business plan, which is a hypothetical projection of how you’ll fare, but your business model. If you’re running a lemonade stand, your business model likely has a few key assumptions:
- The cost of lemonade;
- The amount of foot traffic past your stand;
- The percent of passers-by who will buy from you;
- The price they are willing to pay.
Our Lean lemonade stand would then set about testing and improving each metric, running experiments to find the best street corner, or determine the optimal price.
Lemonade stands are wonderfully simple, so your business may have many other assumptions, but it is essential that you quantify them and state them so you can then focus on improving them, one by one, until your business model and reality align. In a restaurant, for example, these assumptions might be, “we will have at least 50 diners a day” or “diners will spend on average $20 a meal.”
The activity you want changes
We believe most new companies and products go through five distinct stages of growth:
- Empathy, where you figure out what problem you’re solving and what solution people want;
- Stickiness, where you measure how many people adopt your solution rather than trying it and leaving;
- Virality, where you maximize word-of-mouth and references;
- Revenue, where you pour some part of your revenues back into paid acquisition or advertising;
- Scale, where you grow the business through automation, delegation, and process.
OSCON 2013 Speaker Series
Andy Gup (@agup) is a Developer Evangelist at ESRI and OSCON 2013 Speaker. In this interview we talk about location capabilities in apps as well as location analytics.
NOTE: If you are interested in attending OSCON to check out Andy’s talk or the many other cool sessions, click over to the OSCON website where you can use the discount code OS13PROG to get 20% your registration fee.
Key highlights include:
- Mobile apps must have location capabilities [Discussed at 0:25]
- Consider your goals when incorporating location into an app [Discussed at 1:05]
- Is it difficult to add location functionality? [Discussed at 2:19]
- A real-world example of where it made a big difference [Discussed at 3:32]
- Location analytics are very powerful [Discussed at 5:11]
- Augmented reality and location capabilities [Discussed at 6:38]
You can view the full interview here:
Open Source BigTable, Robots Lost, Changing the World, Secrecy Binge
Machine Learning Demos, iOS Debugging, Industrial Internet, and Deanonymity
- MLDemos — an open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. (via Mark Alen)
- kiln (GitHub) — open source extensible on-device debugging framework for iOS apps.
- Industrial Internet — the O’Reilly report on the industrial Internet of things is out. Prasad suggests an illustration: for every car with a rain sensor today, there are more than 10 that don’t have one. Instead of an optical sensor that turns on windshield wipers when it sees water, imagine the human in the car as a sensor — probably somewhat more discerning than the optical sensor in knowing what wiper setting is appropriate. A car could broadcast its wiper setting, along with its location, to the cloud. “Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction.
- Unique in the Crowd: The Privacy Bounds of Human Mobility (PDF, Nature) — We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals. As Edd observed, “You are a unique snowflake, after all.” (via Alasdair Allan)
- A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
- Easy6502 — get started writing 6502 assembly language. Fun way to get started with low-level coding.
- How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
Barlow's distilled insights regarding the ever evolving definition of real time big data analytics
During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?”
“Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without saying anything aloud. He’s really good at that.
In my utterly meager defense, Mike Loukides is the editor on Mike Barlow’s Real-Time Big Data Analytics: Emerging Architecture. As Loukides is one of the core drivers behind O’Reilly’s book publishing program and someone who I perceive to be an unofficial boss of my own choosing, I am not really inclined to worry about things that I really don’t need to worry about. Then I started getting not-so-subtle inquiries from additional people asking if I would consider reviewing the manuscript for the Strata community site. This resulted in me emailing Loukides for a copy and sitting in a local cafe on a Sunday afternoon to read through the manuscript.