- Quick DT — open source (Java) decision tree learner.
- Revealing Hidden Changes to Supreme Court Opinions — WHEREAS, It is now well-documented that the Supreme Court of the United States makes changes to its opinions after the opinion is published; and WHEREAS, Only “Four legal publishers are granted access to “change pages” that show all revisions. Those documents are not made public, and the court refused to provide copies to The New York Times”; and WHEREAS, git makes it easy to identify when changes have been made; RESOLVED, I shall apply a cron job to at least identify when the actual PDF has changed so everyone can see which documents have changed.
- Microsoft’s “Killer” Android Patents Revealed (Ars Technica) — Chinese Government required them disclosed as part of MSFT-Nokia merger. The patent lists are strategically significant, because Microsoft has managed to build a huge patent-licensing business by taxing Android phones without revealing what kind of legal leverage they really have over those phones.
- HTTPie — a command line HTTP client, a user-friendly HTTP client.
Researchers and startups are building tools that enable feature discovery.
Why do data scientists spend so much time on data wrangling and data preparation? In many cases it’s because they want access to the best variables with which to build their models. These variables are known as features in machine-learning parlance. For many0 data applications, feature engineering and feature selection are just as (if not more important) than choice of algorithm:
Good features allow a simple model to beat a complex model.
(to paraphrase Alon Halevy, Peter Norvig, and Fernando Pereira)
The terminology can be a bit confusing, but to put things in context one can simplify the data science pipeline to highlight the importance of features:
Feature Engineering or the Creation of New Features
A simple example to keep in mind is text mining. One starts with raw text (documents) and extracted features could be individual words or phrases. In this setting, a feature could indicate the frequency of a specific word or phrase. Features1 are then used to classify and cluster documents, or extract topics associated with the raw text. The process usually involves the creation2 of new features (feature engineering) and identifying the most essential ones (feature selection).
All systems are distributed systems, and we’re starting to see how they fit into Velocity's themes.
From the beginning, the Velocity Conference has focused on web performance and operations — specifically, web operations. This focus has been fairly narrow: browser performance dominated the discussion of “web performance,” and interactions between developers and IT staff dominated operations.
That’s not to say that Velocity hasn’t looked at the rest of the application stack; there’s been an occasional glance in the direction of the database and an even more occasional glance at the middleware. But the database and middleware have, at least historically, played a bit part. And while the focus of Velocity has been front-end tuning, speakers like Baron Schwartz haven’t let us ignore the database entirely. Read more…
Microsoft, Google and pushing business models too far.
I realized yesterday, though, that:
- Microsoft ruined their brand for me by holding too tightly to things that they considered theirs. (Software.)
- Google is ruining their brand for me by holding too tightly to things that I consider mine. (Identity, everything they can possibly learn about me.)
It’s a weird difference, but the Google version makes me much sadder about the world. As I’d tell a mugger, “You can have my wallet, just don’t take me.”
Some of AI's viable approaches lie outside the organizational boundaries of Google and other large Internet companies.
Editor’s note: this post is part of an ongoing series exploring developments in artificial intelligence.
Here’s a simple recipe for solving crazy-hard problems with machine intelligence. First, collect huge amounts of training data — probably more than anyone thought sensible or even possible a decade ago. Second, massage and preprocess that data so the key relationships it contains are easily accessible (the jargon here is “feature engineering”). Finally, feed the result into ludicrously high-performance, parallelized implementations of pretty standard machine-learning methods like logistic regression, deep neural networks, and k-means clustering (don’t worry if those names don’t mean anything to you — the point is that they’re widely available in high-quality open source packages).
Google pioneered this formula, applying it to ad placement, machine translation, spam filtering, YouTube recommendations, and even the self-driving car — creating billions of dollars of value in the process. The surprising thing is that Google isn’t made of magic. Instead, mirroring Bruce Scheneier’s surprised conclusion about the NSA in the wake of the Snowden revelations, “its tools are no different from what we have in our world; it’s just better funded.” Read more…
The Tango smartphone will help SPHERES navigate space station modules.
I work in the Intelligent Robotics Group (IRG) at NASA Ames Research Center, and when we got the chance to collaborate with our next-door neighbor Google on their new Project Tango, we knew exactly what to do: we’re sending the Project Tango smartphone to the International Space Station, where it will set our robots free.
Smart SPHERES with a space-ready Project Tango phone. Photo courtesy of NASA.