Strata Week: Replaced by robots

Robo-journalism, digital fingerprinting, decentralized soldier networks, and drag-and-drop data retrieval.

Forget very small shell scripts: these days, there are robots looking to replace us.

Stories from statistics

Strata 2011The only thing better than seeing all the stats on your favorite sports team is reading a good article about their most recent game. An article not only tells you a story, but also gives you a sense of connection to another fan (or at least follower) of your team: the writer. These days, however, that writer could be a robot.

A company called StatSheet has been analyzing data on college and pro sports since 2007, but this month launched a network of almost 350 websites dedicated to individual Division I basketball teams that will feature “automated content.”

The StatSheet Network provides maximum coverage of every team, regardless of the size of the team’s fan base or surrounding population. And because our technology platform generates content automatically in real-time, you will be able to get that coverage without the delays and inefficiencies imposed by traditional media companies.

According to the New York Times, the “story-writing software does not perform linguistic analysis; it just uses template sentences and a database of phrases that numbers about 5,000 for now.” Still, even these basics can lead to results that sound somewhat authentic, despite simple sentence structures.

One of the StatSheet Network's Big Ten school sites
One of the StatSheet Network’s Big Ten school sites.

One can easily imagine applications for such robo-writing in financial reporting or advertising, or in other venues that draw heavily on quantifiable data.

Not your mother’s targeting advertising

Speaking of advertising, targeting may get a wee bit more personal — as in, uniquely personal. That’s the idea behind BlueCava, Inc., a company working to digitally “fingerprint” each of the world’s devices: not just computers, but cell phones, gaming consoles, and potentially even cars. They’re currently at about 200 million devices registered and counting; they expect to reach 1 billion by the end of next year, according to the Wall Street Journal.

We’re not just talking cookies here. BlueCava looks at all the different information each device provides, such as software and fonts installed, timestamps, user agents, screen size, and browser plugins. It then assigns a device ID token, and can track the online behavior of that device.

BlueCava deploys that information for two purposes: to combat fraud (e.g., the same device being used to log into many accounts in order to use many credit cards), and to help marketers discover consumer behavior. That kind of uniquely targeted device data is a goldmine to advertisers — especially when combined with a bit of extra information such as user demographics or estimated income.

Income, you ask? Well, yes. “BlueCava says the information it collects about devices can’t be traced back to individuals and that it will offer people a way to opt out of being tracked,” the WSJ reported. But imagine that a device’s user logs into a website or downloads a phone app with BlueCava’s technology embedded. Whatever name or email address that person uses to log in can be matched against offline databases such as property deeds, vehicle registrations, and other public records (including income estimates). The potential marketing power of online and offline data aggregated into a single, evolving user profile is enormous.

As are, clearly, the privacy concerns. The FTC this week released a preliminary staff report proposing “a framework to balance the privacy interests of consumers with innovation that relies on consumer information to develop beneficial new products and services.” The full report is also expected to contain recommendations for a “Do Not Track” mechanism.

Public comments on the FTC’s report will be accepted until Jan. 31, 2011.


The opportunities and implications of data products will be examined at the Strata Conference (Feb. 1-3, 2011). Save 30% on registration with the code STR11RAD.


You can call it Al

You know we can’t mention the federal government in a post about robots without arriving at the military. A British defense contractor called BAE Systems has joined forces with academia (Imperial College London, University of Southampton, University of Bristol, and Oxford University) to develop the Autonomous Learning Agents for Decentralised Data and Information Networks: ALADDIN.

Simply put, ALLADIN is somewhat like the Borg: it’s a collective of robot soldiers that collect and share data before arriving at a joint decision about how to proceed. This could turn out to provide better decisions in chaotic situations than those stemming from a single leader, and that’s the research question at issue in the project.

While the test scenarios are currently focused on disaster relief, The Economist points out the clear implications for other chaotic situations, such as warfare. ALADDIN’s strengths seem to include multi-agent coordination, situational awareness, and resource allocation. And it does so without human emotion.

Whether that’s a good thing, and whether the algorithms can be fine-tuned well enough to deploy pursuit in life-taking rather than life-saving, remains to be seen.

A stitch in data saves … ?

If automated information retrieval floats your boat, be sure to check out Needlebase, a tool for harvesting, merging, and exploring data just made available to the public.

This brief video tutorial offers a great introduction:

The genius of Needle as a tool is that it provides a platform from which to browse the web, allowing you to “train” it as you go. Build a template for your dataset, show Needle which fields on a web page contain the information you want, and then let it guess a few additional example pages. After you confirm that it’s got the hang of things, watch it import all the data you want from a collection of pages. You can also import from local CSV files or other types of data stores.

Then, when it’s time to de-dup or merge fields, a simple drag-and-drop interface makes things a snap. Needle can also help visualize or publish your data.

You can read about a recent use-case at ReadWriteWeb. Marshall Kirkpatrick has a longer write-up here.

Send us news

Email us news, tips and interesting tidbits at strataweek@oreilly.com.

tags: , , , , ,