AI’s dueling definitions

Why my understanding of AI is different from yours.


SoftBank’s Pepper, a humanoid robot that takes its surroundings into consideration.

Editor’s note: this post is part of our Intelligence Matters investigation.

Let me start with a secret: I feel self-conscious when I use the terms “AI” and “artificial intelligence.” Sometimes, I’m downright embarrassed by them.

Before I get into why, though, answer this question: what pops into your head when you hear the phrase artificial intelligence?

For the layperson, AI might still conjure HAL’s unblinking red eye, and all the misfortune that ensued when he became so tragically confused. Others jump to the replicants of Blade Runner or more recent movie robots. Those who have been around the field for some time, though, might instead remember the “old days” of AI — whether with nostalgia or a shudder — when intelligence was thought to primarily involve logical reasoning, and truly intelligent machines seemed just a summer’s work away. And for those steeped in today’s big-data-obsessed tech industry, “AI” can seem like nothing more than a high-falutin’ synonym for the machine-learning and predictive-analytics algorithms that are already hard at work optimizing and personalizing the ads we see and the offers we get — it’s the term that gets trotted out when we want to put a high sheen on things. Read more…

Comments: 6
Four short links: 16 June 2014

Four short links: 16 June 2014

Decision Trees, Decision Modifications, Mobile Patents, Web Client

  1. Quick DT — open source (Java) decision tree learner.
  2. Revealing Hidden Changes to Supreme Court OpinionsWHEREAS, It is now well-documented that the Supreme Court of the United States makes changes to its opinions after the opinion is published; and WHEREAS, Only “Four legal publishers are granted access to “change pages” that show all revisions. Those documents are not made public, and the court refused to provide copies to The New York Times”; and WHEREAS, git makes it easy to identify when changes have been made; RESOLVED, I shall apply a cron job to at least identify when the actual PDF has changed so everyone can see which documents have changed.
  3. Microsoft’s “Killer” Android Patents Revealed (Ars Technica) — Chinese Government required them disclosed as part of MSFT-Nokia merger. The patent lists are strategically significant, because Microsoft has managed to build a huge patent-licensing business by taxing Android phones without revealing what kind of legal leverage they really have over those phones.
  4. HTTPiea command line HTTP client, a user-friendly HTTP client.

Streamlining feature engineering

Researchers and startups are building tools that enable feature discovery.

Why do data scientists spend so much time on data wrangling and data preparation? In many cases it’s because they want access to the best variables with which to build their models. These variables are known as features in machine-learning parlance. For many0 data applications, feature engineering and feature selection are just as (if not more important) than choice of algorithm:

Good features allow a simple model to beat a complex model.
(to paraphrase Alon Halevy, Peter Norvig, and Fernando Pereira)

The terminology can be a bit confusing, but to put things in context one can simplify the data science pipeline to highlight the importance of features:

Feature engineering and discovery pipeline

Feature Engineering or the Creation of New Features
A simple example to keep in mind is text mining. One starts with raw text (documents) and extracted features could be individual words or phrases. In this setting, a feature could indicate the frequency of a specific word or phrase. Features1 are then used to classify and cluster documents, or extract topics associated with the raw text. The process usually involves the creation2 of new features (feature engineering) and identifying the most essential ones (feature selection).

Read more…


From the network interface to the database

All systems are distributed systems, and we’re starting to see how they fit into Velocity's themes.


From the beginning, the Velocity Conference has focused on web performance and operations — specifically, web operations. This focus has been fairly narrow: browser performance dominated the discussion of “web performance,” and interactions between developers and IT staff dominated operations.

These limits weren’t bad. Perceived performance really is dominated by the browser — how fast you can get resources (HTML, images, CSS files, JavaScript libraries) over the network to the browser, and how fast the browser can execute those resources. How long before a user stops waiting for your page to load and clicks away? How do you make a page useable as quickly as possible, even before all the resources have loaded? Those discussions were groundbreaking and surprising: users are incredibly sensitive to page speed.

That’s not to say that Velocity hasn’t looked at the rest of the application stack; there’s been an occasional glance in the direction of the database and an even more occasional glance at the middleware. But the database and middleware have, at least historically, played a bit part. And while the focus of Velocity has been front-end tuning, speakers like Baron Schwartz haven’t let us ignore the database entirely. Read more…

Comment: 1

Your money or your life

Microsoft, Google and pushing business models too far.

Photo by Didier, used under a Creative Commons license.I know it’s hard to run a large company. I know that organizations can get too deep into their own visions to imagine conflicting values.

I realized yesterday, though, that:

  • Microsoft ruined their brand for me by holding too tightly to things that they considered theirs. (Software.)
  • Google is ruining their brand for me by holding too tightly to things that I consider mine. (Identity, everything they can possibly learn about me.)

It’s a weird difference, but the Google version makes me much sadder about the world. As I’d tell a mugger, “You can have my wallet, just don’t take me.”

Photo by Didier, used under a Creative Commons license.


Untapped opportunities in AI

Some of AI's viable approaches lie outside the organizational boundaries of Google and other large Internet companies.

Editor’s note: this post is part of an ongoing series exploring developments in artificial intelligence.

Here’s a simple recipe for solving crazy-hard problems with machine intelligence. First, collect huge amounts of training data — probably more than anyone thought sensible or even possible a decade ago. Second, massage and preprocess that data so the key relationships it contains are easily accessible (the jargon here is “feature engineering”). Finally, feed the result into ludicrously high-performance, parallelized implementations of pretty standard machine-learning methods like logistic regression, deep neural networks, and k-means clustering (don’t worry if those names don’t mean anything to you — the point is that they’re widely available in high-quality open source packages).

Google pioneered this formula, applying it to ad placement, machine translation, spam filtering, YouTube recommendations, and even the self-driving car — creating billions of dollars of value in the process. The surprising thing is that Google isn’t made of magic. Instead, mirroring Bruce Scheneier’s surprised conclusion about the NSA in the wake of the Snowden revelations, “its tools are no different from what we have in our world; it’s just better funded.” Read more…