What business leaders need to know about data and data analysis to drive their businesses forward.
Foster and Tom have a long history of applying data to practical business problems. Their book, which evolved into Data Science for Business, was different from all the other data science books I’ve seen. It wasn’t about tools: Hadoop and R are scarcely mentioned, if at all. It wasn’t about coding: business students don’t need to learn how to implement machine learning algorithms in Python. It is about business: specifically, it’s about the data analytic thinking that business people need to work with data effectively.
Data analytic thinking means knowing what questions to ask, how to ask those questions, and whether the answers you get make sense. Business leaders don’t (and shouldn’t) do the data analysis themselves. But in this data-driven age, it’s critically important for business leaders to understand how to work with the data scientists on their teams. Read more…
Cruftifying web pages is not what Velocity is about.
There’s been a lot said and written about web performance since the Velocity conference. And steps both forward and back — is the web getting faster? Are developers using increased performance to add more useless gunk to their pages, taking back performance gains almost as quickly as they’re achieved?
I don’t want to leap into that argument; Arvind Jain did a good job of discussing the issues at Velocity Santa Clara and in a blog post on Google’s analytics site. But, I do want to discuss (all right, flame) about one issue that bugs me.
I see a lot of pages that appear to load quickly. I click on a site, and within a second, I have an apparently readable page.
“Apparently,” however, is a loaded word because a second later, some new component of the page loads, causing the browser to re-layout the page, so everything jumps around. Then comes the pop-over screen, asking if I want to subscribe or take a survey. (Most online renditions of print magazines: THIS MEANS YOU!). Then another resize, as another component appears. If I want to scroll down past the lead picture, which is usually uninteresting, I often find that I can’t because the browser is still laying out bits and pieces of the page. It’s almost as if the developers don’t want me to read the page. That’s certainly the effect they achieve.
Despite reports of breakthroughs in battery technology, the hard problems of battery innovation remain hard.
Lately there’s been a spate of articles about breakthroughs in battery technology. Better batteries are important, for any of a number of reasons: electric cars, smoothing out variations in the power grid, cell phones, and laptops that don’t need to be recharged daily.
All of these nascent technologies are important, but some of them leave me cold, and in a way that seems important. It’s relatively easy to invent new technology, but a lot harder to bring it to market. I’m starting to understand why. The problem isn’t just commercializing a new technology — it’s everything that surrounds that new technology.
Take an article like Battery Breakthrough Offers 30 Times More Power, Charges 1,000 Times Faster. For the purposes of argument, let’s assume that the technology works; I’m not an expert on the chemistry of batteries, so I have no reason to believe that it doesn’t. But then let’s take a step back and think about what a battery does. When you discharge a battery, you’re using a chemical reaction to create electrical current (which is moving electrical charge). When you charge a battery, you’re reversing that reaction: you’re essentially taking the current and putting that back in the battery.
So, if a battery is going to store 30 times as much power and charge 1,000 times faster, that means that the wires that connect to it need to carry 30,000 times more current. (Let’s ignore questions like “faster than what?,” but most batteries I’ve seen take between two and eight hours to charge.) It’s reasonable to assume that a new battery technology might be able to store electrical charge more efficiently, but the charging process is already surprisingly efficient: on the order of 50% to 80%, but possibly much higher for a lithium battery. So improved charging efficiency isn’t going to help much — if charging a battery is already 50% efficient, making it 100% efficient only improves things by a factor of two. How big are the wires for an automobile battery charger? Can you imagine wires big enough to handle thousands of times as much current? I don’t think Apple is going to make any thin, sexy laptops if the charging cable is made from 0000 gauge wire (roughly 1/2 inch thick, capacity of 195 amps at 60 degrees C). And I certainly don’t think, as the article claims, that I’ll be able to jump-start my car with the battery in my cell phone — I don’t have any idea how I’d connect a wire with the current-handling capacity of a jumper cable to any cell phone I’d be willing to carry, nor do I want a phone that turns into an incendiary firebrick when it’s charged, even if I only need to charge it once a year.
The magic starts when household devices can communicate over a network.
Well over a decade ago, Bill Joy was mocked for talking about a future that included network-enabled refrigerators. That was both unfair and unproductive, and since then, I’ve been interested in a related game: take the most unlikely household product you can and figure out what you could do if it were network-enabled. That might have been a futuristic exercise in 1998, but the future is here. Now. And there are few reasons we couldn’t have had that future back then, if we’d have the vision.
So, what are some of the devices that could be Internet-enabled, and what would that mean? We’re already familiar with the Nest; who would have thought even five years ago that we’d have Internet-enabled thermostats?
Facebook scraping could lead to machine-generated spam so good that it's indistinguishable from legitimate messages.
A recent blog post inquired about the incidence of Facebook-based spear phishing: the author suddenly started receiving email that appeared to be from friends (though it wasn’t posted from their usual email addresses), making the usual kinds of offers and asking him to click on the usual links. He wondered whether this was a phenomenon and how it happened — how does a phisherman get access to your Facebook friends?
The answers are “yes, it happens” and “I don’t know, but it’s going to get worse.” Seriously, my wife’s name has been used in Facebook phishing. A while ago, several of her Facebook friends said that her email account had been hacked. I was suspicious; she only uses Gmail, and hacking Google isn’t easy, particularly with two-factor authentication. So, I asked her friends to send me the offending messages. It was obvious that they hadn’t come from my wife’s account; they were Yahoo accounts with her name but an unrecognizable email address, exactly what this blogger had seen.
How does this happen? How can a phisher discover your name and your Facebook friends? I don’t know, but Facebook is such a morass of weird and conflicting security settings that it’s impossible to know just how private or how public you are. If you’ve ever friended people you don’t know (a practice that remains entirely too common), and if you’ve ever enabled visibility to friends of friends, you have no idea who has access to your conversations.
Tom Stuart's new book will shed light on what you're really doing when you're programming.
Understanding Computation started from Tom’s talk Programming with Nothing, which he presented at Ruby Manor in 2011. That talk was a tour-de-force: it showed how to implement a more-or-less complete programming system without using any libraries, methods, classes, objects, or even control structures, assignments, arrays, strings, or numbers. It was, literally, programming with nothing. And it was an eye-opener.
Shortly after I saw the conference video, I talked to Tom to ask if we could do more like this. And amazingly, the answer was “yes.” He was very interested in teaching the theory of computing through Ruby, using similar techniques. What does a program mean? What does it mean for something to be a program? How do we build languages that can handle ever more flexible abstractions? What kinds of problems can’t we solve computationally? It’s all here, and it’s all clearly demonstrated via Ruby code. It’s not code that you’d ever use in a real application (trust me, doing arithmetic without numbers, assignments, and control statements is ridiculously slow). But it is code that will expand your mind and leave you with a much better understanding of what you’re doing when you’re programming.
Skepticism isn't a blanket rejection of data; it's central to understanding data.
I’d like to correct the impression, given by Derrick Harris on GigaOm, that I’m part of a backlash against “big data.”
I’m not skeptical about data or the power of data, but you don’t have to look very far or very hard to see data abused. The best people to be skeptical about the data, and to point out the abuse of data, are data scientists because they understand problems such as overfitting, bias, and much more.
Cathy O’Neil recently wrote about a Congressional hearing in which a teacher at a new data science program dodged some perceptive questions about whether he was teaching students to be skeptical about results, whether he was teaching students how to test whether their observations were real signals or just noise. Anyone who has worked with data knows that false correlations come cheaply, particularly when you’re working with a lot of data. But ducking that question is not the attitude we need.
Data is valuable. I see no end to the collection or analysis of data, nor should their be an and. But like any tool, we have to be careful about how we use it. Skepticism isn’t a blanket rejection of data; it’s central to understanding data. That’s precisely what makes “science” science.
And of all people, journalists should understand what skepticism means, even if they don’t have the technical tools to practice it.
The boundaries created by traditional management are just getting in the way of reducing product cycle times.
If I’ve seen any theme come up repeatedly over the past year, it’s getting product cycle times down. It’s not the sexiest or most interesting theme, but it’s everywhere: if it’s not on the front burner, it’s always simmering in the background.
Cutting product cycles to the bare minimum is one of the main themes of the Velocity Conference and the DevOps movement, where integration between developers and operations, along with practices like continuous deployment, allows web-native companies like Yahoo! to release upgrades to their web products many times a day. It’s no secret that many traditional enterprises are looking at this model, trying to determine what they can use or implement. Indeed, this is central to their long-term survival; companies as different from Facebook as GE and Ford are learning that they will need to become as agile and nimble as their web-native counterparts.
Integrating development and operations isn’t the only way to shorten product cycles. In his talk at Google IO, Braden Kowitz talked about shortening the design cycle: rather than build big, complete products that take a lot of time and money, start with something very simple and test it, then iterate quickly. This approach lets you generate and test lots of ideas, but be quick to throw away the ones that aren’t working. Rather than designing an Edsel, just to fail when the product is released, the shorter cycles that come from integrating product design with product development let you build iteratively, getting immediate feedback on what works and what doesn’t. To work like this, you need to break down the silos that separate engineers and designers; you need to integrate designers into the product team as early as possible, rather than at the last minute. Read more…
If you have a good memory, you know that I’ve written about 3D printers. Technically, I grew up with the laser printer; my first computer industry job (part-time while getting an English PhD) was with Imagen, a startup that built the first laser printer that cost under $20,000, then the first that cost under $10,000, then under $7,000, and died a slow death after Apple produced the first that cost under $5000. Now a laser printer costs a few hundred. And I’ve been cheering as 3D printers followed the the same price curve.
But even as I’ve been cheering, I’ve had this nagging doubt in the back of my head. So I can 3D-print my own chess set. Cool. So what? Sure, you can do great things with them (enclosures for projects; every DIY-bio lab I’ve visited has a 3D printer stashed somewhere). While a 3D printer is an important step in bringing 21st-century tooling to the home hacker, they’re still fairly limited.
Last night, the other shoe dropped. Otherfab, a project of Saul Griffiths’ Otherlab, has a new Kickstarter project for Othermill: a home computer-controlled milling machine. A milling machine is a large, versatile beast that uses a high-speed cutting bit to sculpt material (often metal) into the desired shape. Instead of adding layers of plastic or some other material, like a 3D printer, a milling machine cuts material away. If you’ve ever visited machine shops, you know that milling machines are where the magic happens. Particularly state-of-the-art computer controlled mills. They’re big, they’re expensive, and they can do just about anything. Putting one in the home shop — that’s revolutionary. Read more…
I was thrilled to receive an invitation to a new meetup: the NYC Data Skeptics Meetup. If you’re in the New York area, and you’re interested in seeing data used honestly, stop by!
That announcement pushed me to write another post about data skepticism. The past few days, I’ve seen a resurgence of the slogan that correlation is as good as causation, if you have enough data. And I’m worried. (And I’m not vain enough to think it’s a response to my first post about skepticism; it’s more likely an effect of Cukier’s book.) There’s a fundamental difference between correlation and causation. Correlation is a two-headed arrow: you can’t tell in which direction it flows. Causation is a single-headed arrow: A causes B, not vice versa, at least in a universe that’s subject to entropy.
Let’s do some thought experiments–unfortunately, totally devoid of data. But I don’t think we need data to get to the core of the problem. Think of the classic false correlation (when teaching logic, also used as an example of a false syllogism): there’s a strong correlation between people who eat pickles and people who die. Well, yeah. We laugh. But let’s take this a step further: correlation is a double headed arrow. So not only does this poor logic imply that we can reduce the death rate by preventing people from eating pickles, it also implies that we can harm the chemical companies that produce vinegar by preventing people from dying. Read more…