<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>O&#039;Reilly Radar &#187; Alistair Croll</title>
	<atom:link href="http://radar.oreilly.com/alistairc/feed" rel="self" type="application/rss+xml" />
	<link>http://radar.oreilly.com</link>
	<description>Insight, analysis, and research about emerging technologies</description>
	<lastBuildDate>Fri, 17 May 2013 16:29:56 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Stacks get hacked: The inevitable rise of data warfare</title>
		<link>http://radar.oreilly.com/2013/01/stacks-get-hacked-the-inevitable-rise-of-data-warfare.html</link>
		<comments>http://radar.oreilly.com/2013/01/stacks-get-hacked-the-inevitable-rise-of-data-warfare.html#comments</comments>
		<pubDate>Sat, 19 Jan 2013 03:39:21 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[data warfare]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[stack]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://radar.oreilly.com/?p=55330</guid>
		<description><![CDATA[First, technology is good. Then it gets bad. Then it gets stable. This has been going on for a long time, likely since the invention of fire, knives, or the printed word. But I want to focus specifically on computing &#8230; ]]></description>
				<content:encoded><![CDATA[<p>First, technology is good. Then it gets bad. Then it gets stable.</p>
<p>This has been going on for a long time, likely since the invention of fire, knives, or the printed word. But I want to focus specifically on computing technology. The human race is busy colonizing a second online world and sticking prosthetic brains &mdash; today, we call them smartphones &mdash; in front of our eyes and ears. And stacks of technology on which they rely are vulnerable.</p>
<p>When we first created automatic phone switches, hackers quickly learned how to blow a Cap&#8217;n Crunch whistle to get free calls from pay phones. When consumers got modems, attackers soon figured out how to rapidly redial to get more than their fair share of time on a BBS, or to program scripts that could brute-force their way into others&#8217; accounts. Eventually, we got better passwords and we fixed the pay phones and switches.</p>
<p>We moved up the networking stack, above the physical and link layers. We tasted TCP/IP, and found it good. Millions of us installed Trumpet Winsock on consumer machines. We were idealists rushing onto the wild open web and proclaiming it a new utopia. Then, because of the way the TCP handshake worked, hackers figured out how to DDOS people with things like SYN attacks. Escalation, and router hardening, ensued.</p>
<p>We built HTTP, and SQL, and more. At first, they were open, innocent, and helped us make huge advances in programming. Then attackers found ways to exploit their weaknesses with cross-site scripting and buffer overruns. They hacked armies of machines to do their bidding, flooding target networks and taking sites offline. Technologies like MP3s gave us an explosion in music, new business models, and abundant crowd-sourced audiobooks &mdash; even as they leveled a music industry with fresh forms of piracy for which we hadn&#8217;t even invented laws. <span id="more-55330"></span></p>
<p>Here&#8217;s a more specific example of unintended consequences. <a href="http://en.wikipedia.org/wiki/Paul_Mockapetris">Paul Mockapetris</a> is one of the creators of today&#8217;s Internet. He created DNS and implemented SMTP, fundamental technologies on which all of us rely. But he&#8217;s also single-handedly responsible for all the spam in the world.</p>
<p>That might be a bit of an overstatement, though I tease him about it from time to time. But there&#8217;s a nugget of truth to it: DNS was a simplified version of more robust directories like those in X.25. Paul didn&#8217;t need all that overhead, because he was just trying to solve the problem of remembering all those Internet addresses by hand, not trying to create a hardened, authenticated, resilient address protocol. He also created SMTP, the Simple Mail Transport Protocol. It was a whittled-down version of MTP &mdash; hence the &#8220;S&#8221; &mdash; and it didn&#8217;t include end-to-end authentication.</p>
<p>These two things &mdash; SMTP and DNS &mdash; make spam possible. If either of them had some kind of end-to-end authentication, it would be far harder for spammers to send unwanted messages and get away with it. Today, they&#8217;re so entrenched, that attempts to revise email protocols in order to add authentication have consistently failed. We&#8217;re willing to live with the glut of spam that clogs our servers because of the tremendous value of email.</p>
<p>We owe much of the Internet&#8217;s growth to simplicity and openness. Because of how Paul built DNS and SMTP, there&#8217;s no need to go through a complex bureaucracy to start something, or to jump through technical hoops to send an email to someone you met at a bar. We can invite a friend to a new application without strictures and approvals. The Internet has flourished precisely <em>because</em> it was built on a foundation of loose consensus and working code. It&#8217;s also flourished in spite of it.</p>
<p>Each of these protocols, from the lowly physical connections and links of Ethernet and PPP all the way up through TCP sessions and HTTP transactions, are arranged in a stack, independent layers of a delicious networking cake. By dividing the way the Internet works into discrete layers, innovation can happen at one layer (copper to fiber; Token Ring to Ethernet; UDP to TCP; Flash to DHTML; and so on) independent of the other layers. We didn&#8217;t need to rewrite the Internet to build YouTube.</p>
<p>Paul, and the other framers of the web, didn&#8217;t know we&#8217;d use it to navigate, or stream music &mdash; but they left it open so we could. But where the implications of BBS hacking or phone phreaking were limited to a small number of homebrew hackers, the implications for the web were far bigger, because by now, everyone relied on it.</p>
<p>Anyway, on to big data.</p>
<p>Geeks often talk about &#8220;layer 8.&#8221; When an IT operator sighs resignedly that it&#8217;s a layer 8 problem, she means it&#8217;s a human&#8217;s fault. It&#8217;s where humanity&#8217;s rubber meets technology&#8217;s road. And big data is interesting precisely <em>because</em> it&#8217;s the layer 8 protocol. It&#8217;s got great power, demands great responsibility, and portends great risk unless we do it right. And just like the layers beneath it, it&#8217;s going to get good, then bad, then stable.</p>
<p>Other layers of the protocol stack have come under assault by spammers, hackers, and activists. There&#8217;s no reason to think layer 8 won&#8217;t as well. And just as hackers find a clever exploit to intercept and spike an SSL session, or trick an app server into running arbitrary code, so they&#8217;ll find an exploit for big data.</p>
<p>The term &#8220;data warfare&#8221; might seem a bit hyperbolic, so I&#8217;ll try to provide a few concrete examples. I&#8217;m hoping for plenty more in the <a href="http://oreillynet.com/pub/e/2557?intcmp=il-strata-webcast-rise-of-data-warfare">Strata Online Conference we&#8217;re running next week</a>, which has a stellar lineup of people who have spent time thinking about how to do naughty things with information at scale.</p>
<h2>Injecting noise</h2>
<p>Analytics applications rely on tags embedded in URLs to understand the nature of traffic their receive. URLs contain parameters that identify the campaign, the medium, and other information on the source of visitors. For example, the URL <a href="http://www.solveforinteresting.com/?utm_campaign=datawar">http://www.solveforinteresting.com?utm_campaign=datawar</a> tells Google Analytics to assign visits from that link to the campaign &#8220;datawar.&#8221; There&#8217;s seldom any verification of this information &mdash; with many analytics packages it&#8217;s included in plain text. Let&#8217;s say, as a joke, you decide you&#8217;d like your name to be the most prolific traffic source on a friend&#8217;s blog. All you need is a few willing participants, and you can simply visit the blog from many browsers and machines using your name as the campaign tag. You&#8217;ll be the top campaign traffic source.</p>
<p>This seems innocent enough, until you realize that you can take a similar approach to misleading your competitor. You can make them think a less-effective campaign is outperforming a successful one. You can trick them into thinking Twitter is a better medium than Google+, when in fact the reverse is true, which causes them to pay for customer acquisition in less-effective channels.</p>
<p>The reality isn&#8217;t this simple &mdash; smart businesses track campaigns by outcomes such as purchases rather than by raw visitors. But the point is clear: open-ended data schemes like tagging work because they&#8217;re extensible and simple, but that also makes them vulnerable. The practice of &#8220;<a href="http://mashable.com/2012/04/19/google-bombs/">googlebombing</a>,&#8221; is a good example. Linking a word or definition to a particular target (such as sending searches for &#8220;miserable failure&#8221; to a biography on the White House website) simply exploits the openness of Google&#8217;s underlying algorithms.</p>
<p>But even if you think you have a reliable data source, you may be wrong. Consider that a few years ago, only 324 Athenians reported having swimming pools on their tax returns. This seemed low to some civil servants in Greece, so they decided to check. Using Google Map, <a href="http://www.nytimes.com/2010/05/02/world/europe/02evasion.html?hp=&amp;pagewanted=all">they counted 16,974 of them</a>  &mdash;  despite <a href="http://www.telegraph.co.uk/news/worldnews/europe/greece/7664764/Revolution-from-Greeces-ruins-as-crisis-deepens.html">efforts by citizens to camouflage their pools</a> under green tarpaulins.</p>
<p>Whether the data is injected, or simply collected unreliably, data&#8217;s first weakness is its source. Collection is seldom authenticated. There&#8217;s a reason prosecutors insist on chain of evidence; but big data and analytics, like DNS and SMTP, is usually built for simplicity and ubiquity rather than for resiliency or auditability.</p>
<h2>Mistraining the algorithms</h2>
<p>Most of us get attacks almost daily, in the form of spam and phishing. Most of these attacks are blocked by heuristics and algorithms.</p>
<p>Spammers are in a constant arms race with these algorithms. Each message that&#8217;s flagged as spam is an input into anti-spam algorithms &mdash; so if a word like &#8220;Viagra&#8221; appears in a message you consider spam, then the algorithm is slightly more likely to consider that word &#8220;spammy&#8221; in future.</p>
<p>If you run a blog, you probably see plenty of comment spam filled with nonsense words &mdash; these are attempts to mistrain the machine-learning algorithms that block spammy content by teaching it innocuous words, undermining its effectiveness. You&#8217;re actually watching a fight between spammers and blockers, played out comment by comment, on millions of websites around the world.</p>
<h2>Making other attacks more effective</h2>
<p>Anti-spam heuristics happen behind the scenes, and they work pretty well. Despite this, some spam does get through. But when it does, we seldom click on it, because it&#8217;s easy to spot. It&#8217;s poorly worded; it comes from an unfamiliar source; it doesn&#8217;t render properly in our mail client.</p>
<p><em>What if that weren&#8217;t the case?</em></p>
<p>A motivated attacker can target an individual. If they&#8217;re willing to invest time researching their target, they can gain trust or impersonate a friend. The discovery of several nation-state-level viruses aimed at governments and rich targets shows a concerted, hand-crafted phishing attack can work. In the hands of an attacker, tools like Facebook&#8217;s Graph Search or Peekyou are a treasure trove of facts that can be used to craft a targeted attack.</p>
<p>The reason spam is still easy to spot is that it&#8217;s traditionally been hard to automate this work. People don&#8217;t dig through your trash unless you&#8217;re under investigation.</p>
<p>Today, however, consumers have access to &#8220;big data&#8221; tools that spy agencies could only dream of a few short years ago, which means attackers do, too, and the effectiveness of phishing, identity theft, and other information crimes will soar once bad actors learn how to harness these tools.</p>
<p>But digging through virtual trash and data exhaust is what machines do best. Big data lets personal attacks work at scale. If smart data scientists with decent grammar tried to maximize spam effectiveness, we&#8217;d lose quickly. To them, phishing is just another optimization problem.</p>
<h2>Trolling to polarize</h2>
<p>Data warfare doesn&#8217;t have to be as obvious as injecting falsehoods, mis-training machine learning algorithms, or leveraging abundant personal data to trick people. It can be far more subtle. Let&#8217;s say, for example, you wanted to polarize a political discussion such as gun control in order to reduce the reasoned discourse of compromise and justify your hard-lined stance. <em>All you need to do is get angry.</em></p>
<p><a href="http://www.motherjones.com/environment/2013/01/you-idiot-course-trolls-comments-make-you-believe-science-less">A recent study</a> showed that the tone of comments in a blog post had a tangible impact on how readers responded to the post. When comments used reasonable language, readers&#8217; views were more moderate. But when those comments were aggressive, readers hardened their positions. Those that agreed with the post did so more strongly, and those who disagreed objected more fiercely. The chance for compromise vanished.</p>
<p>Similar approaches can sway sentiment analysis tools that try to gauge public opinion on social platforms. Once reported, these sentiments often form actual opinion, because humans like to side with the majority. Data becomes reality. There are plenty of other examples of &#8220;adversarial&#8221; data escalation. Consider the <a href="http://www.extremetech.com/extreme/143382-programmer-creates-800000-books-algorithmically-starts-selling-them-on-amazon">programmer who created 800,000 books and injected them into Amazon&#8217;s catalog</a>.  Thanks to the frictionless nature of ebooks and the ease of generating them, he&#8217;s saturated their catalog (hat tip to <a href="http://strata.oreilly.com/edd">Edd</a> for this one.)</p>
<h2>The year of data warfare</h2>
<p>Data warfare is real. In some cases, such as spam, it&#8217;s been around for decades. In other cases, like tampering with a competitor&#8217;s data, it&#8217;s been possible, but too expensive, until cloud computing and new algorithms made it cheap and easy. And in many new instances, it&#8217;s possible precisely because of our growing dependence on information to lead our daily lives.</p>
<p>Just as the inexorable cycle of good, bad, and stable has happened at every layer, so it will happen with big data. But unlike attacks on lower levels of the stack, this time it won&#8217;t just be spam in an inbox. It&#8217;ll be both our online and offline lives. Attackers can corrupt information, blind an algorithm, inject falsehood, changing outcomes in subtle, insidious ways that undermine a competitor or flip an election. Attacks on data become attacks on people.</p>
<p>If I have to pick a few hot topics for 2013, data warfare is one of them. I&#8217;m looking forward to next week&#8217;s <a href="http://oreillynet.com/pub/e/2557?intcmp=il-strata-webcast-rise-of-data-warfare">online event</a>, because I&#8217;m convinced that this arms race will affect all of us in the coming years, and it&#8217;ll be a long time before the armistice of détente.</p>
<p><em>This post originally appeared on <a href="http://strata.oreilly.com/2013/01/data-warefare.html">Strata</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2013/01/stacks-get-hacked-the-inevitable-rise-of-data-warfare.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Thin walls and traffic cameras</title>
		<link>http://radar.oreilly.com/2012/10/thin-walls-traffic-cameras.html</link>
		<comments>http://radar.oreilly.com/2012/10/thin-walls-traffic-cameras.html#comments</comments>
		<pubDate>Sat, 20 Oct 2012 14:30:52 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[balances]]></category>
		<category><![CDATA[checks]]></category>
		<category><![CDATA[context]]></category>
		<category><![CDATA[forgiveness]]></category>
		<category><![CDATA[gossip]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[thin walls]]></category>
		<category><![CDATA[traffic cameras]]></category>
		<category><![CDATA[transparency]]></category>

		<guid isPermaLink="false">http://radar.oreilly.com/?p=53548</guid>
		<description><![CDATA[A couple of years ago, I spoke with a European Union diplomat who shall remain nameless about the governing body&#8217;s attitude toward privacy. &#8220;Do you know why the French hate traffic cameras?&#8221; he asked me. &#8220;It&#8217;s because it makes it &#8230; ]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/thisisbossi/3361608344/"><img src="http://s.radar.oreilly.com/wp-files/2/2012/10/1012-traffic-camera.jpg" alt="2008 06 11 - 3313b - Silver Spring - 16th St Circle Traffic Camera by thisisbossi, on Flickr" width="340" height="258" style="float: right;margin: 5px 0 10px 15px" /></a>A couple of years ago, I spoke with a European Union diplomat who shall remain nameless about the governing body&#8217;s attitude toward privacy.</p>
<p>&#8220;Do you know why the French hate traffic cameras?&#8221; he asked me. <em>&#8220;It&#8217;s because it makes it hard for them to cheat on their spouses.&#8221;</em></p>
<p>He contended that while it was possible for a couple to overlook subtle signs of infidelity &mdash; a brush of lipstick on a collar, a stray hair, or the smell of a man&#8217;s cologne &mdash; the hard proof of a speeding ticket given on the way to an afternoon tryst couldn&#8217;t be ignored.</p>
<p>Humans live in these grey areas. A 65 mph speed limit is really a suggestion; it&#8217;s up to the officers to enforce that limit. That allows for context: a reckless teen might get pulled over for going 70, but a careful driver can go 75 without incident.</p>
<p>But a computer that&#8217;s programmed to issue tickets to speeders doesn&#8217;t have that ambiguity. And its accusations are hard to ignore because they&#8217;re factual, rooted in hard data and numbers.</p>
<h2>Did big data kill privacy?</h2>
<p>With the rise of a data-driven society, it&#8217;s tempting to pronounce privacy dead. Each time we connect to a new service or network, we&#8217;re agreeing to leave a digital breadcrumb trail behind us. And increasingly, not connecting makes us social pariahs, leaving others to wonder what we have to hide.</p>
<p>But maybe privacy is a fiction. For millennia &mdash; before the rise of city-states &mdash; we lived in villages. Gossip, hearsay, and whisperings heard through thin-walled huts were the norm.</p>
<p>Shared moral values and social pressure helped groups to compete better against other groups, helping to evolve the societies and religions that dominate the world today. Humans thrive in part because of our <a href="http://www.ikebarberlearningcentre.ubc.ca/jonathan-haidt-the-groupish-gene-hive-psychology-and-the-origins-of-morality-and-religion">groupish</a> nature &mdash; which is why moral psychologist Jonathan Haidt says we&#8217;re 90% chimp and 10% bee. We might have evolved as selfish individuals, but we conquered the Earth as selfish teams.</p>
<p>In other words, being private is relatively new, perhaps only transient, and gossip helped us get here.<span id="more-53548"></span></p>
<h2>Prediction isn&#8217;t invasion</h2>
<p>Much of what we see as technology&#8217;s invasion of privacy is really just prediction. As we connect the world&#8217;s databases &mdash; tying together smartphones, loyalty programs, medical records, and the other constellations in the galaxy of our online lives &mdash; we&#8217;re doing something that looks a lot like invading privacy. But it&#8217;s not.</p>
<p>Big data doesn&#8217;t peer into your browser history or look through your bedside table to figure out what porn you like; rather, it <a href="http://dl.acm.org/citation.cfm?id=2063618">infers your taste</a> in smut from the kind of music you like. Big data doesn&#8217;t administer a pregnancy test; instead, it <a href="http://afr.com/p/technology/big_data_creeps_out_online_customers_2CJAxqYONJO2wzf1Yos7YL">guesses you&#8217;re pregnant</a> because of what you buy. Many of big data&#8217;s predictions are a boon, helping us to fight disease, devote resources to the right problems, and pinpoint ways to help the disadvantaged.</p>
<p><em>Is prediction an invasion of privacy?</em> Not really. Companies will compete based on their ability to guess what&#8217;s going to happen. We&#8217;re simply <a href="http://solveforinteresting.com/the-selfish-economics-of-big-data/">taking the inefficiency out of the way we&#8217;ve dealt with risk in the past</a>. Algorithms can be wrong. Prediction is only a problem when we cross the moral Rubicon of prejudice: treating you differently because of those predictions, changing the starting conditions for unfair reasons.</p>
<p>Unfortunately, big data&#8217;s predictions are often frighteningly accurate, so the temptation to treat them as fact is almost overwhelming. Policing looks like thoughtcrime. And tomorrow, a just society is a skeptical one.</p>
<h2>We&#8217;re leakier than we know</h2>
<p><a href="http://www.flickr.com/photos/vrogy/511644410/" title="The cup that can only be half-full. by vrogy, on Flickr"><img src="http://s.radar.oreilly.com/wp-files/2/2012/10/1012-leaky-cup.jpg" alt="Picture by Michael Vroegop (vrogy) on Flickr" width="340" height="366" style="float: right;margin: 5px 0 10px 15px" /></a>Long before the Internet, we left a breadcrumb trail of personal details behind us: call history, credit-card receipts, car mileage, bank records, music purchases, library check-outs, and so on.</p>
<p>But until big data, baking the breadcrumbs back into a loaf was hard. Paper records were messy, and physical copies were hard to collect. Unless you were being pursued by an army of investigators, the patterns of your life remained hidden in plain sight. We weren&#8217;t really private &mdash; we just <em>felt</em> like we were, and it was too hard for others to prove otherwise without a lot of work.</p>
<p>No more. Big data represents a radical drop in the cost of tying together vast amounts of disparate data quickly. Digital records are clean, easy to analyze, and trivial to copy. That means the illusion of personal privacy is vanishing &mdash; but we should remember that it&#8217;s always been an illusion.</p>
<p>Our digital lives make this even more true. We&#8217;re probably not aware of what&#8217;s being collected as we surf the web &mdash; but it&#8217;s pretty easy to tell where someone&#8217;s been through browser trickery, cross-site advertising, and the like. So <a href="http://www.forbes.com/sites/kashmirhill/2012/10/16/the-obama-and-romney-campaigns-know-if-youve-visited-porn-sites-why-do-not-track-matters/" target="_blank">when a politician calls for your vote</a>, they may know more about you than you want. But let&#8217;s not confound promiscuous surfing behavior &mdash; leaving more breadcrumbs &mdash; with an improved ability to bake those crumbs back into a loaf.</p>
<p>Big data didn&#8217;t force us to overshare; it&#8217;s just better at noticing when we do and deriving meaning from it. And because of this, it&#8217;s back to thin-walled huts and gossip. Only this time, because it&#8217;s digital and machine-driven, there are a couple of important twists to consider.</p>
<h2>This ain&#8217;t your ancestors&#8217; privacy</h2>
<p>There are two key differences, however, between our ancestors&#8217; gossip-filled, thin-walled villages and today&#8217;s global digital village.</p>
<p><strong>First</strong>, consider the two-way flow of gossip. A thousand years ago, word-of-mouth worked both ways. Someone who told tales too often risked ostracism. We could confront our accusers. Social mores were a careful balance of shame and approval, with checks and balances.</p>
<p>That balance is gone. We can&#8217;t confront our digital accusers. If we&#8217;re denied a loan, we lack the tools to understand why. Often, we aren&#8217;t even aware that we&#8217;ve been painted with a digital scarlet letter. As one Oxford professor put it, &#8220;nobody knows the offer they didn&#8217;t receive.&#8221;</p>
<p>Big data is whispering things about us &mdash; both inferred predictions and assembled truths &mdash; and we don&#8217;t even know it.</p>
<p><strong>Second</strong>, everyone knew gossip was imperfect. We&#8217;ve all played &#8220;broken telephone&#8221; and seen how easily many mouths distort a message. We&#8217;re skeptical of a single truth. We&#8217;ve learned to forgive, to question.</p>
<p>The same studies that show groups should ostracize those who don&#8217;t chip in also suggest that the best strategy of all is to forgive occasionally &mdash; just in case the initial failure was an honest mistake. In other words, when dealing with whispered truths, we lived life in a grey area.</p>
<p>Unfortunately, digital accusations &mdash; like those made by traffic cameras &mdash; leave little room for mercy and tolerance because they lack that grey area in which much of human interaction thrives. If we&#8217;re going to build data-driven systems, then those systems need grey areas.</p>
<h2>New rules for the new transparency</h2>
<p>In the timeline of human history, privacy is relatively recent. It may even be that privacy was an anomaly, that our social natures rely on leakage to thrive, and that we&#8217;re nearing the end of a transient time where the walls between us gave us the illusion of secrecy.</p>
<p>But now that technology is tearing down those walls, we need checks and balances to ensure that we don&#8217;t let predictions become prejudices. Even when those predictions are based in fact, we must build both context and mercy into the data-driven decisions that govern our quantified future.</p>
<p><em>This post originally appeared on <a href="http://solveforinteresting.com/thin-walls-and-traffic-cameras/">Solve for Interesting</a>. This version has been lightly edited.</em></p>
<p><em>Photos: Traffic camera, <a href="http://www.flickr.com/photos/thisisbossi/3361608344/" title="2008 06 11 - 3313b - Silver Spring - 16th St Circle Traffic Camera by thisisbossi, on Flickr">Silver Spring &#8211; 16th St Circle Traffic Camera by thisisbossi, on Flickr</a>; leaky cup, <a href="http://www.flickr.com/photos/vrogy/511644410/" title="The cup that can only be half-full. by vrogy, on Flickr">The cup that can only be half-full. by vrogy, on Flickr</a>.</em></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2012/10/new-ethics-for-a-new-world.html">New ethics for a new world</a></li>
<li> <a href="http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html">Big data is our generation’s civil rights issue, and we don’t know it</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/10/thin-walls-traffic-cameras.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>New ethics for a new world</title>
		<link>http://radar.oreilly.com/2012/10/new-ethics-for-a-new-world.html</link>
		<comments>http://radar.oreilly.com/2012/10/new-ethics-for-a-new-world.html#comments</comments>
		<pubDate>Wed, 17 Oct 2012 13:00:17 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data driven]]></category>
		<category><![CDATA[data ethics]]></category>
		<category><![CDATA[data products]]></category>
		<category><![CDATA[digital]]></category>
		<category><![CDATA[physical]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[shift]]></category>
		<category><![CDATA[tribes]]></category>

		<guid isPermaLink="false">http://radar.oreilly.com/?p=53455</guid>
		<description><![CDATA[Since the first of our ancestors chipped stone into weapon, technology has divided us. Seldom more than today, however: a connected, always-on society promises health, wisdom, and efficiency even as it threatens an end to privacy and the rise of &#8230; ]]></description>
				<content:encoded><![CDATA[<p>Since the first of our ancestors chipped stone into weapon, technology has divided us. Seldom more than today, however: a connected, always-on society promises health, wisdom, and efficiency even as it threatens an end to privacy and the rise of prejudice masked as science.</p>
<p>On its surface, a data-driven society is more transparent, and makes better uses of its resources. By connecting human knowledge, and mining it for insights, we can pinpoint problems before they become disasters, warding off disease and shining the harsh light of data on injustice and corruption. Data is making cities smarter, watering the grass roots, and improving the way we teach.</p>
<p>But for every accolade, there&#8217;s a cautionary tale. It&#8217;s easy to forget that data is merely a tool, and in the wrong hands, that tool can do powerful wrong. Data erodes our privacy. It predicts us, often with unerring accuracy &mdash; and treating those predictions as fact is a new, insidious form of prejudice. And it can collect the chaff of our digital lives, harvesting a picture of us we may not want others to know.</p>
<p>The big data movement isn&#8217;t just about knowing more things. It&#8217;s about a fundamental shift from scarcity to abundance. Most markets are defined by scarcity &mdash; the price of diamonds, or oil, or music. But when things become so cheap they&#8217;re nearly free, a funny thing happens.</p>
<p>Consider the advent of steam power. Economist Stanley Jevons, in what&#8217;s known as <a href="http://en.wikipedia.org/wiki/Jevons_paradox">Jevons&#8217; Paradox</a>, observed that as the efficiency of steam engines increased, coal consumption went up. That&#8217;s not what was supposed to happen. Jevons realized that abundance creates new ways of using something. As steam became cheap, we found new ways of using it, which created demand.</p>
<p>The same thing is happening with data. A report that took a month to run is now just a few taps on a tablet. An unthinkably complex analysis of competitors is now a Google search. And the global distribution of multimedia content that once required a broadcast license is now an upload.<span id="more-53455"></span></p>
<p>Big data is about reducing the cost of analyzing our world. The resulting abundance is triggering entirely new ways of using that data. Visualizations, interfaces, and ubiquitous data collection are increasingly important, because they feed the machine &mdash; and the machine is hungry.</p>
<p>The results are controversial. Journalists rely on global access to data, but also bring a new skepticism to their work, because facts are easy to manufacture. There&#8217;s <a href="http://www.npr.org/blogs/thetwo-way/2012/06/04/154309254/its-not-your-imagination-americans-are-more-polarized-says-pew">good evidence</a> that we&#8217;ve never been as polarized, politically, as we are today &mdash; and data may be to blame. You can find evidence to support any conspiracy, expose any gaffe, or refute any position you dislike, but separating truth from mere data is a growing problem.</p>
<p>Perhaps the biggest threat that a data-driven world presents is an ethical one. Our social safety net is woven on uncertainty. We have welfare, insurance, and other institutions precisely because we can&#8217;t tell what&#8217;s going to happen &mdash; so we amortize that risk across shared resources. The better we are at predicting the future, the less we&#8217;ll be willing to share our fates with others. And the more those predictions look like facts, the more justice looks like thoughtcrime.</p>
<p>The human race underwent a huge shift when we banded together into tribes, forming culture and morals to tie us to one another. As groups, we achieved great heights, building nations, conquering challenges, and exploring the unknown. If you were one of those tribesmen, it&#8217;s unlikely you knew what was happening &mdash; it&#8217;s only in hindsight that the shift from individual to group was radical.</p>
<p>We&#8217;re in the middle of another, perhaps bigger, shift, one that&#8217;s taking us from physical beings to digital/physical hybrids. We&#8217;re colonizing an online world, and just as our ancestors had to create new social covenants and moral guidelines to work as groups, so we have to craft new ethics, rights and laws.</p>
<p>Those fighting for social change have their work cut out for them, because they&#8217;re not just trying to find justice &mdash; they&#8217;re helping to rewrite the ethical and moral guidelines for a nascent, always-on, data-driven species.</p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html">Big data is our generation&#8217;s civil rights issue, and we don&#8217;t know it</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/10/new-ethics-for-a-new-world.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Follow up on big data and civil rights</title>
		<link>http://radar.oreilly.com/2012/08/follow-up-on-big-data-and-civil-rights.html</link>
		<comments>http://radar.oreilly.com/2012/08/follow-up-on-big-data-and-civil-rights.html#comments</comments>
		<pubDate>Wed, 29 Aug 2012 13:00:11 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@editpick]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[civil rights]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data rights]]></category>
		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://radar.oreilly.com/?p=51344</guid>
		<description><![CDATA[A few weeks ago, I wrote a post about big data and civil rights, which seems to have hit a nerve. It was posted on Solve for Interesting and here on Radar, and then folks like Boing Boing picked it &#8230; ]]></description>
				<content:encoded><![CDATA[<p>A few weeks ago, I wrote a post about big data and civil rights, which seems to have hit a nerve. It was posted on <a href="http://solveforinteresting.com/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it/">Solve for Interesting</a> and here on <a href="http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html">Radar</a>, and then folks like <a href="http://boingboing.net/2012/08/14/civil-rights-implications-of-b.html">Boing Boing</a> picked it up.</p>
<p>I haven&#8217;t had this kind of response to a post before (well, I&#8217;ve had responses, such as the <a href="http://gigaom.com/2008/04/06/10-ways-the-internet-will-die/#comments">comments to this piece for GigaOm</a> five years ago, but they haven&#8217;t been nearly as thoughtful).</p>
<p>Some of the best posts have really added to the conversation. Here&#8217;s a list of those I suggest for further reading and discussion:</p>
<h2>Nobody notices offers they don&#8217;t get</h2>
<p><a href="http://blog.practicalethics.ox.ac.uk/2012/08/asking-the-right-questions-big-data-and-civil-rights/">On Oxford&#8217;s <em>Practical Ethics</em> blog,</a> Anders Sandberg argues that transparency and reciprocal knowledge about how data is being used will be essential. Anders captured the core of my concerns in a single paragraph, saying what I wanted to far better than I could:</p>
<blockquote><p>&#8230; nobody notices offers they do not get. And if these absent opportunities start following certain social patterns (for example not offering them to certain races, genders or sexual preferences) they can have a deep civil rights effect</p></blockquote>
<p>To me, this is a key issue, and it responds eloquently to some of the comments on the original post. Harry Chamberlain commented:</p>
<blockquote><p>However, what would you say to the criticism that you are seeing lions in the darkness? In other words, the risk of abuse certainly exists, but until we see a clear case of big data enabling and fueling discrimination, how do we know there is a real threat worth fighting?</p></blockquote>
<p><span id="more-51344"></span>I think that this is precisely the point: you can&#8217;t see the lions in the darkness, because you&#8217;re not aware of the ways in which you&#8217;re being disadvantaged.  If whites get an offer of 20% off, but minorities don&#8217;t, that&#8217;s basically a 20% price hike on minorities &mdash; but it&#8217;s just marketing, so apparently it&#8217;s okay.</p>
<h2>Context is everything</h2>
<p><a href="http://www.flickr.com/photos/mararie/301867205/" title="crystal ball ii by mararie, on Flickr"><img src="http://s.radar.oreilly.com/wp-files/2/2012/08/0812-crystal-ball.jpg" border="0" alt="crystal ball ii by mararie, on Flickr" width="370" style="float: right; margin: 5px 0 15px 10px;" /></a>Mary Ludloff of Patternbuilders asks, &#8220;When does someone else&#8217;s problem become ours?&#8221; Mary is a <a href="http://strataconf.com/stratany2012/public/schedule/speaker/103865">presenter</a> at Strata, and an expert on digital privacy. She has a very pragmatic take on things. One point Mary makes is that all this analysis is about prediction &mdash; we&#8217;re taking a ton of data and making a prediction about you:</p>
<blockquote><p>The issue with data, particularly personal data, is this: context is everything. And if you are not able to personally question me, you are guessing the context.</p></blockquote>
<p>If we (mistakenly) predict something, and act on it, we may have wronged someone. Mary makes clear that this is thoughtcrime &mdash; arresting someone because their behavior <em>looked</em> like that of a terrorist, or pedophile, or thief. Firing someone because their email patterns <em>suggested</em> they weren&#8217;t going to make their sales quota. That&#8217;s the injustice.</p>
<p>This is actually about <a href="http://en.wikipedia.org/wiki/Negative_right">negative rights</a>, which Wikipedia describes as:</p>
<blockquote><p>Rights considered <em>negative rights</em> may include <a title="Civil and political rights" href="http://en.wikipedia.org/wiki/Civil_and_political_rights">civil and political rights</a> such as <a title="Freedom of speech" href="http://en.wikipedia.org/wiki/Freedom_of_speech">freedom of speech</a>, <a title="Property (ownership right)" href="http://en.wikipedia.org/wiki/Property_(ownership_right)">private property</a>, freedom from <a title="Violent crime" href="http://en.wikipedia.org/wiki/Violent_crime">violent crime</a>, <a title="Freedom of worship" href="http://en.wikipedia.org/wiki/Freedom_of_worship">freedom of worship</a>, <em><a title="Habeas corpus" href="http://en.wikipedia.org/wiki/Habeas_corpus">habeas corpus</a></em>, a <a title="Fair trial" href="http://en.wikipedia.org/wiki/Fair_trial">fair trial</a>, freedom from <a title="Slavery" href="http://en.wikipedia.org/wiki/Slavery">slavery</a>.</p></blockquote>
<p>Most philosophers agree that negative rights outweigh positive ones (i.e. I have a right to fresh air more than you have a right to smoke around me.) So our negative right (to be left unaffected by your predictions) outweighs your positive one. As analytics comes <a href="http://www.dailymail.co.uk/sciencetech/article-2190531/Mobile-phone-companies-predict-future-movements-users-building-profile-lifestyle.html">closer and closer to predicting actual behavior</a>, we need to remember the lesson of negative rights.</p>
<h2>Big data is the new printing press</h2>
<p>Lori Witzel <a href="http://hauntedbymarketing.posterous.com/big-data-analytics-a-destabilizing-force-not">compares the advent of big data to the creation of the printing press,</a> pointing out &mdash; somewhat optimistically &mdash; that once books were plentiful, it was hard to control the spread of information. She has a good point &mdash; we&#8217;re looking at things from this side of the big data singularity:</p>
<blockquote><p>And as the cost of Big Data and Big Data Analytics drops, I predict we&#8217;ll see a similar dispersion of technology, and similar destabilizations to societies where these technologies are deployed.</p></blockquote>
<p>There&#8217;s a chance that we&#8217;ll democratize access to information so much that it&#8217;ll be the corporations, not the consumers, that are forced to change.</p>
<h2>While you slept last night</h2>
<p>TIBCO&#8217;s Chris Taylor, standing in for Kashmir Hill at Forbes, <a href="http://www.forbes.com/sites/kashmirhill/2012/08/28/while-you-slept-last-night-big-data-privacy-and-the-public-square/2/">paints a dystopian picture of video-as-data,</a> and just how much tracking we&#8217;ll face in the future:</p>
<blockquote><p>This makes laughable the idea of an implanted chip as the way to monitor a population. We&#8217;ve implanted that chip in our phones, and in video, and in nearly every way we interact with the world. Even paranoids are right sometimes.</p></blockquote>
<p>I had a wide-ranging chat with Chris last week. We&#8217;re sure to spend more time on this in the future.</p>
<h2>The veil of ignorance</h2>
<p>The idea for the <a href="http://solveforinteresting.com/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it/">original post</a> came from a conversation I had with some civil rights activists in Atlanta a few months ago, who hadn&#8217;t thought about the subject. They (or their parents) walked with Martin Luther King, Jr. But to them big data was &#8220;just tech.&#8221; That bothered me, because unless we think of these issues in the context of society and philosophy, bad things will happen to good people.</p>
<p>Perhaps the best tool for thinking about these ethical issues is the <a href="http://en.wikipedia.org/wiki/Veil_of_ignorance">Veil of Ignorance</a>. It&#8217;s a philosophical exercise for deciding social issues that goes like this:</p>
<ol>
<li >Imagine you don&#8217;t know where you will be in the society you&#8217;re creating. You could be a criminal, a monarch, a merchant, a pauper, an invalid.</li>
<li> Now design the best society you can.</li>
</ol>
<p>Simple, right? When we&#8217;re looking at legislation for big data, this is a good place to start. We should set privacy, transparency, and use policies without knowing whether we&#8217;re ruling or oppressed, straight or gay, rich or poor.</p>
<p><em>This post originally appeared on <a href="http://solveforinteresting.com/followup-on-big-data-and-civil-rights/">Solve for Interesting</a>. This version has been lightly edited. Photo: <a href="http://www.flickr.com/photos/mararie/301867205/" title="crystal ball ii by mararie, on Flickr">crystal ball ii by mararie, on Flickr</a></em></p>
<div style="float: left; border-top: thin gray solid; border-bottom: thin gray solid; padding: 20px; margin: 20px 2px; clear: both;"><a href="https://en.oreilly.com/stratany2012/public/regwith/RADAR20?intcmp=il-strata-stny12-big-data-civil-rights-follow-up"><img style="float: left; border: none; padding-right: 10px;" src="http://cdn.oreilly.com/radar/images/promos/2012-strata-ny-promo.gif" /></a><a href="https://en.oreilly.com/stratany2012/public/regwith/RADAR20?intcmp=il-strata-stny12-big-data-civil-rights-follow-up"><strong>Strata Conference + Hadoop World</strong></a> &mdash;  The O&#8217;Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World.</p>
<p><a href="https://en.oreilly.com/stratany2012/public/regwith/RADAR20?intcmp=il-strata-stny12-big-data-civil-rights-follow-up"><strong>Save 20% on registration with the code RADAR20</strong></a></div>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html">Big data is our generation&#8217;s civil rights issue, and we don&#8217;t know it</a></li>
<li> <a href="http://radar.oreilly.com/2011/08/theres-no-such-thing-as-big-da.html">There&#8217;s no such thing as big data</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/08/follow-up-on-big-data-and-civil-rights.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Three kinds of big data</title>
		<link>http://radar.oreilly.com/2012/08/three-kinds-of-big-data.html</link>
		<comments>http://radar.oreilly.com/2012/08/three-kinds-of-big-data.html#comments</comments>
		<pubDate>Tue, 21 Aug 2012 14:00:56 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@editpick]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[hype cycle]]></category>
		<category><![CDATA[Marketing]]></category>
		<category><![CDATA[society]]></category>

		<guid isPermaLink="false">http://radar.oreilly.com/?p=50965</guid>
		<description><![CDATA[In the past couple of years, marketers and pundits have spent a lot of time labeling everything &#8221;big data.&#8221; The reasoning goes something like this: Everything is on the Internet. The Internet has a lot of data. Therefore, everything is big &#8230; ]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/mtl_shag/344700422/"><img class="alignright size-full wp-image-50979" src="http://s.radar.oreilly.com/wp-files/2/2012/08/castorpolluxcolumns.jpeg" alt="Photo of the columns of Castor and Pollux by OliverN5 on Flickr" width="180" height="240" /></a>In the past couple of years, marketers and pundits have spent a lot of time labeling everything &#8221;big data.&#8221; The reasoning goes something like this:</p>
<ul>
<li>Everything is on the Internet.</li>
<li>The Internet has a lot of data.</li>
<li>Therefore, everything is big data.</li>
</ul>
<p>When you have a hammer, everything looks like a nail. When you have a Hadoop deployment, everything looks like big data. And if you&#8217;re trying to cloak your company in the mantle of a burgeoning industry, big data will do just fine. But seeing big data everywhere is a sure way to hasten the inevitable fall from the peak of high expectations to the <a href="//en.wikipedia.org/wiki/Hype_cycle">trough of disillusionment.</a></p>
<p>We saw this with cloud computing. From early idealists saying everything would live in a magical, limitless, free data center to today&#8217;s pragmatism about virtualization and infrastructure, we soon took off our rose-colored glasses and put on welding goggles so we could actually build stuff.</p>
<p><strong>So where will big data go to grow up?</strong></p>
<p>Once we get over ourselves and start rolling up our sleeves, I think big data will fall into three major buckets: Enterprise BI, Civil Engineering, and Customer Relationship Optimization. This is where we&#8217;ll see most IT spending, most government oversight, and most early adoption in the next few years. <span id="more-50965"></span></p>
<h2>Enterprise BI 2.0</h2>
<p>For decades, analysts have relied on business intelligence (BI) products like <a href="//en.wikipedia.org/wiki/Oracle_Hyperion">Hyperion</a>, <a href="//www.microstrategy.com/">Microstrategy</a> and <a href="//www-01.ibm.com/software/analytics/cognos/">Cognos</a> to crunch large amounts of information and generate reports. Data warehouses and BI tools are great at answering the same question—such as &#8220;what were Mary&#8217;s sales this quarter?&#8221;—over and over again. But they&#8217;ve been less good at the exploratory, what-if, unpredictable questions that matter for planning and decision-making because that kind of fast exploration of unstructured data is traditionally hard to do and therefore expensive.</p>
<p>Most &#8220;legacy&#8221; BI tools are constrained in two ways:</p>
<ul>
<li><strong>First,</strong> they&#8217;ve been schema-then-capture tools in which the analyst decides what to collect, then later captures that data for analysis.</li>
<li><strong>Second,</strong> they&#8217;ve typically focused on reporting what <a href="//twitter.com/avinash">Avinash Kaushik</a> (channeling Donald Rumsfeld) refers to as &#8220;known unknowns&#8221;—things we know we don&#8217;t know, and generate reports for.</li>
</ul>
<p>These tools are used for reporting and operational purposes, and are usually focused on controlling costs, executing against an existing plan, and reporting on how things are going.</p>
<p>As my Strata co-chair <a href="http://twitter.com/edd">Edd Dumbill</a> pointed out when I asked for thoughts on this piece:</p>
<blockquote><p>&#8220;The predominant functional application of big data technologies today is in ETL (Extract, Transform, and Load). I&#8217;ve heard the figure that it&#8217;s about 80% of Hadoop applications. Just the real grunt work of log file or sensor processing before loading into an analytic database like Vertica.&#8221;</p></blockquote>
<p>The availability of cheap, fast computers and storage, as well as open source tools, have made it okay to capture first and ask questions later. That changes how we use data because it lets analysts speculate beyond the initial question that triggered the collection of data.</p>
<p>What&#8217;s more, the speed with which we can get results—sometimes as fast as a human can ask them—makes data easier to explore interactively. This combination of interactivity and speculation takes BI into the realm of &#8220;unknown unknowns,&#8221; the insights that can produce a competitive advantage or an out-of-the-box differentiator.</p>
<p>Cloud computing underwent a transition from promise to compromise. First big, public clouds wooed green-field startups. Then, a few years later, incumbent IT vendors introduced private cloud offerings. These private clouds included only a fraction of the benefits of their public cousins—but were nevertheless a sufficient blend of smoke, mirrors, and features to delay the inevitable move to public resources by a few years and appease the business. For better or worse, that&#8217;s where most IT cloud budgets are being spent today according to <a href="//www.idc.com/getdoc.jsp?containerId=prUS23097611">IDC</a>, <a href="//www.gartner.com/it/page.jsp?id=1239813">Gartner</a>, and others. Big data adoption will undergo a similar cycle.</p>
<p>In the next few years, then, look for acquisitions and product introductions—and not a little vaporware—as BI vendors that enterprises trust bring them &#8220;big data lite&#8221;: enough innovation and disruption to satisfy the CEO&#8217;s golf buddies, but not so much that enterprise IT&#8217;s jobs are threatened. This, after all, is how change comes to big organizations.</p>
<p>Ultimately, we&#8217;ll see traditional &#8220;known unknowns&#8221; BI reporting living alongside big-data-powered data import and cleanup, and fast, exploratory data &#8220;unknown unknown&#8221; interactivity.</p>
<h2>Civil Engineering</h2>
<p>The second use of big data is in society and government. Already, data mining can be used to predict disease outbreaks, understand traffic patterns, and improve education.</p>
<p>Cities are facing budget crunches, infrastructure problems, and a crowding from rural citizens. Solving these problems is urgent, and cities are perfect labs for big data initiatives. Take a metropolis like New York: hackathons; open feeds of public data; and a population that generates a flood of information as it shops, commutes, gets sick, eats, and just goes about its daily life.</p>
<p style="text-align: center"><a href="http://www.datagotham.com/" rel="attachment wp-att-50981"><img class="aligncenter  wp-image-50981" src="http://s.radar.oreilly.com/wp-files/2/2012/08/datagotham-Sponsors1-620x229.jpg" alt="Datagotham is just one example of a city's efforts to hack itself" width="620" /></a></p>
<p>I think municipal data is one of the big three for several reasons: it&#8217;s a <strong>good tie breaker for partisanship</strong>, we have <strong>new interfaces everyone can understand</strong>, and we finally have a <strong>mostly-connected citizenry</strong>.</p>
<p>In an era of <a href="//articles.latimes.com/2012/jun/04/news/la-pn-pew-partisan-divide-poll-20120604">partisan bickering</a>, hard numbers can settle the debate. So, they&#8217;re not just good government; they&#8217;re good politics. Expect to see big data applied to social issues, helping us to make funding more effective and scarce government resources more efficient (perhaps to the chagrin of some public servants and lobbyists). As this works in the world&#8217;s biggest cities, it&#8217;ll spread to smaller ones, to states, and to municipalities.</p>
<p>Making data accessible to citizens is possible, too: Siri and Google Now show the potential for personalized agents; Narrative Science takes complex data and turns it into words the masses can consume easily; Watson and Wolfram Alpha can give smart answers, either through curated reasoning or making smart guesses.</p>
<p>For the first time, we have a connected citizenry armed (for the most part) with smartphones. <a href="http://blog.nielsen.com/nielsenwire/consumer/smartphones-to-overtake-feature-phones-in-u-s-by-2011/">Nielsen estimated that smartphones would overtake feature phones in 2011</a>, and that concentration is high in urban cores. The App Store is full of apps for bus schedules, commuters, local events, and other tools that can quickly become how governments connect with their citizens and manage their bureaucracies.</p>
<p>The consequence of all this, of course, is more data. Once governments go digital, their interactions with citizens can be easily instrumented and analyzed for waste or efficiency. That&#8217;s sure to provoke resistance from those who don&#8217;t like the scrutiny or accountability, but it&#8217;s a side effect of digitization: every industry that goes digital gets analyzed and optimized, whether it likes it or not.</p>
<h2>Customer Relationship Optimization</h2>
<p>The final home of applied big data is marketing. More specifically, it&#8217;s improving the relationship with consumers so companies can, as <a href="http://en.wikipedia.org/wiki/Sergio_Zyman">Sergio Zyman</a> once said, sell them more stuff, more often, for more money, more efficiently.</p>
<p>The biggest data systems today are focused on web analytics, ad optimization, and the like. Many of today&#8217;s most popular architectures were weaned on ads and marketing, and have their ancestry in direct marketing plans. They&#8217;re just more focused than the <a href="//en.wikipedia.org/wiki/Claritas_Prizm">comparatively blunt instruments</a> with which direct marketers used to work.</p>
<p><a href="http://www.flickr.com/photos/fdctsevilla/4306301206/"><img class="alignright size-full wp-image-50983" src="http://s.radar.oreilly.com/wp-files/2/2012/08/funnel-tamis.jpeg" alt="Tamis a Lait by El Bibliomata on Flickr" width="240" height="214" /></a>The number of contact points in a company has multiplied significantly. Where once there was a phone number and a mailing address, today there are web pages, social media accounts, and more. Tracking users across all these channels — and turning every click, like, share, friend, or retweet into the start of a long funnel that leads, inexorably, to revenue is a big challenge. It&#8217;s also one that companies like Salesforce understand, with its investments in chat, social media monitoring, co-browsing, and more.</p>
<p>This is what&#8217;s lately been referred to as the &#8220;360-degree customer view&#8221; (though it&#8217;s <a href="http://radar.oreilly.com/2011/08/theres-no-such-thing-as-big-da.html">not clear that companies will actually act on customer data</a> if they have it, or whether <a href="http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html">doing so will become a compliance minefield</a>). Big data is already intricately linked to online marketing, but it will branch out in two ways.</p>
<p><strong>First,</strong> it&#8217;ll go from online to offline. Near-field-equipped smartphones with ambient check-in are a marketer&#8217;s wet dream, and they&#8217;re coming to pockets everywhere. It&#8217;ll be possible to track queue lengths, store traffic, and more, giving retailers fresh insights into their brick-and-mortar sales. Ultimately, companies will bring the optimization that online retail has enjoyed to an offline world as <a href="http://www.navizon.com/product-navizon-indoor-triangulation-system">consumers become trackable</a>.</p>
<p><strong>Second,</strong> it&#8217;ll go from Wall Street (or maybe that&#8217;s Madison Avenue and Middlefield Road) to Main Street. Tools will get easier to use, and while small businesses might not have a BI platform, they&#8217;ll have a tablet or a smartphone that they can bring to their places of business. Mobile payment players like <a href="https://squareup.com/">Square</a> are already making them reconsider the checkout process. Adding portable customer intelligence to the tool suite of local companies will broaden how we use marketing tools.</p>
<h2>Headlong into the trough</h2>
<p>That&#8217;s my bet for the next three years, given the molasses of market confusion, vendor promises, and unrealistic expectations we&#8217;re about to contend with. Will big data change the world? Absolutely. Will it be able to defy the usual cycle of earnest adoption, crushing disappointment, and eventual rebirth all technologies must travel? Certainly not.</p>
<p><strong>Related:</strong></p>
<ul>
<li><a href="http://radar.oreilly.com/2012/01/what-is-big-data.html">What is big data?</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/08/three-kinds-of-big-data.html/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Big data is our generation&#8217;s civil rights issue, and we don&#8217;t know it</title>
		<link>http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html</link>
		<comments>http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html#comments</comments>
		<pubDate>Thu, 02 Aug 2012 13:00:19 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[@editpick]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[civil rights]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data meaning]]></category>
		<category><![CDATA[data use]]></category>
		<category><![CDATA[personalization]]></category>
		<category><![CDATA[schema]]></category>

		<guid isPermaLink="false">http://radar.oreilly.com/?p=50087</guid>
		<description><![CDATA[Data doesn&#8217;t invade people&#8217;s lives. Lack of control over how it&#8217;s used does. What&#8217;s really driving so-called big data isn&#8217;t the volume of information. It turns out big data doesn&#8217;t have to be all that big. Rather, it&#8217;s about a &#8230; ]]></description>
				<content:encoded><![CDATA[<p>Data doesn&#8217;t invade people&#8217;s lives. <em>Lack of control over how it&#8217;s used does.</em></p>
<p>What&#8217;s really driving so-called big data isn&#8217;t the volume of information. It turns out big data doesn&#8217;t have to be all that big. Rather, it&#8217;s about a reconsideration of the fundamental economics of analyzing data.</p>
<p>For decades, there&#8217;s been a fundamental tension between three attributes of databases. You can have the data fast; you can have it big; or you can have it varied. The catch is, you can&#8217;t have all three at once.</p>
<p><a href="http://solveforinteresting.com/wp-content/uploads/2012/07/big-data-triangle.png"><img src="http://s.radar.oreilly.com/wp-files/2/2012/08/0812-1-big-data-triangle.png" border="0" alt="The big data trifecta" width="620" /></a></p>
<p>I&#8217;d first heard this as the &#8220;three V&#8217;s of data&#8221;: Volume, Variety, and Velocity. Traditionally, getting two was easy but getting three was very, very, very expensive.</p>
<p>The advent of clouds, platforms like Hadoop, and the inexorable march of Moore&#8217;s Law means that now, analyzing data is trivially inexpensive. And when things become so cheap that they&#8217;re practically free, big changes happen &mdash; just look at the advent of steam power, or the copying of digital music, or the rise of home printing. Abundance replaces scarcity, and we invent new business models.</p>
<p>In the old, data-is-scarce model, companies had to decide what to collect first, and then collect it. A traditional enterprise data warehouse might have tracked sales of widgets by color, region, and size. This act of deciding what to store and how to store it is called designing the schema, and in many ways, it&#8217;s the moment where someone decides what the data is about. It&#8217;s the instant of context.</p>
<p>That needs repeating:</p>
<p><strong>You decide what data is <em>about</em> the moment you define its schema.</strong></p>
<p><span id="more-50087"></span>
<p>With the new, data-is-abundant model, we collect first and ask questions later. The schema comes <em>after</em> the collection. Indeed, big data success stories like Splunk, Palantir, and others are prized because of their ability to make sense of content well after it&#8217;s been collected &mdash; sometimes called a schema-less query. This means we collect information long before we decide what it&#8217;s for.</p>
<p>And this is a dangerous thing.</p>
<p>When bank managers tried to restrict loans to residents of certain areas (known as redlining) Congress stepped in to stop it (with the Fair Housing Act of 1968). They were able to legislate against discrimination, making it illegal to change loan policy based on someone&#8217;s race.</p>
<p><a href="http://en.wikipedia.org/wiki/Redlining"><img src="http://s.radar.oreilly.com/wp-files/2/2012/08/0812-2-Home_Owners_Loan_Corporation_Philadelphia_redlining_map.jpg" width="620" border="0" style="margin-bottom:15px;" alt="Home Owners' Loan Corporation map showing redlining of hazardous districts in 1936" /></a><br /><em>Home Owners&#8217; Loan Corporation map showing <a href="http://en.wikipedia.org/wiki/Redlining">redlining</a> of &#8220;hazardous&#8221; districts in 1936.</em></p>
<hr />
<p>&#8220;Personalization&#8221; is another word for discrimination. We&#8217;re not discriminating if we tailor things to you based on what we know about you &mdash; right? That&#8217;s just better service.</p>
<p>In <a href="http://abcnews.go.com/GMA/GetsAnswers/story?id=6747461&#038;page=1#.UBmL1qllo9V">one case</a>, American Express used purchase history to adjust credit limits based on where a customer shopped, despite his excellent credit limit:</p>
<blockquote><p>Johnson says his jaw dropped when he read one of the reasons American Express gave for lowering his credit limit: &#8220;Other customers who have used their card at establishments where you recently shopped have a poor repayment history with American Express.&#8221;</p>
</blockquote>
<p><a href="http://blog.okcupid.com/index.php/the-real-stuff-white-people-like/"><img src="http://s.radar.oreilly.com/wp-files/2/2012/08/0812-3-The-REAL-Stuff-White-People-Like.png" border="0" alt="Some of the things white men liked in 2010, according to OKCupid" width="265" style="float: right; margin: 5px 0 10px 15px;" /></a>We&#8217;re seeing the start of this slippery slope everywhere from <a href="http://www.nytimes.com/2012/02/05/opinion/sunday/facebook-is-using-you.html">tailored credit-card limits</a> like this one to <a href="http://www.progressive.com/auto/snapshot.aspx">car insurance based on driver profiles</a>. In this regard, big data is a civil rights issue, but it&#8217;s one that society in general is ill-equipped to deal with.</p>
<p>We&#8217;re great at using taste to predict things about people. OKcupid&#8217;s <a href="http://blog.okcupid.com/index.php/the-real-stuff-white-people-like/">2010 blog post &#8220;The Real Stuff White People Like&#8221;</a> showed just how easily we can use information to guess at race. It&#8217;s a real eye-opener (and the guys who wrote it didn&#8217;t include everything they learned &mdash; some of it was a bit too controversial). They simply looked at the words one group used which others didn&#8217;t often use. The result was a list of &#8220;trigger&#8221; words for a particular race or gender.</p>
<p><em>Now run this backwards</em>. If I know you like these things, or see you mention them in blog posts, on Facebook, or in tweets, then there&#8217;s a good chance I know your gender and your race, and maybe even your religion and your sexual orientation. And that I can personalize my marketing efforts towards you.</p>
<p>That makes it a civil rights issue.</p>
<p>If I collect information on the music you listen to, you might assume I will use that data in order to suggest new songs, or share it with your friends. But instead, I could use it to guess at your racial background. And then I could use that data to deny you a loan.</p>
<p>Want another example? Check out <a href="http://solveforinteresting.com/private-data-in-public-ways/">Private Data In Public Ways</a>, something I wrote a few months ago after seeing a talk at Big Data London, which discusses how publicly available last name information can be used to generate racial boundary maps:</p>
<p><a href="http://solveforinteresting.com/private-data-in-public-ways/"><img src="http://s.radar.oreilly.com/wp-files/2/2012/08/0812-5-wpid-Photo-2012-04-26-1117-AM.png" border="0" alt="Screen from the Mapping London project" width="620" style="margin-bottom: 15px;" /></a><br /><em>Screen from the <a href="http://names.mappinglondon.co.uk">Mapping London project</a>.</em></p>
<hr />
<p><a href="http://www.youtube.com/watch?feature=player_embedded&amp;v=_jtAnlejBs4">This TED talk by Malte Spitz</a> does a great job of explaining the challenges of tracking citizens today, and he speculates about whether the Berlin Wall would ever have come down if the Stasi had access to phone records in the way today&#8217;s governments do.</p>
<p>So how do we regulate the way data is used?</p>
<p>The only way to deal with this properly is to somehow link <em>what the data is</em> with <em>how it can be used</em>. I might, for example, say that my musical tastes should be used for song recommendation, but not for banking decisions.</p>
<p>Tying data to permissions can be done through encryption, which is slow, riddled with DRM, burdensome, hard to implement, and bad for innovation. Or it can be done through legislation, which has about as much chance of success as regulating spam: it feels great, but it&#8217;s damned hard to enforce.</p>
<p>There are brilliant examples of how a quantified society can improve the way we live, love, work, and play. Big data helps <a href="http://www.google.org/flutrends/about/how.html">detect disease</a> outbreaks, <a href="http://schoolofone.org/news/">improve how students learn</a>, reveal <a href="http://enikrising.blogspot.ca/2010/08/ultimate-political-networks-graph.html">political partisanship</a>, and <a href="http://www.bbc.co.uk/news/uk-england-london-13389363">save hundreds of millions of dollars for commuters</a> &mdash; to pick just four examples. These are benefits we simply can&#8217;t ignore as we try to survive on a planet bursting with people and shaken by climate and energy crises.</p>
<p>But governments need to balance reliance on data with checks and balances about how this reliance erodes privacy and creates civil and moral issues we haven&#8217;t thought through. It&#8217;s something that most of the electorate isn&#8217;t thinking about, and yet it affects every purchase they make.</p>
<p>This should be fun.</p>
<p><em>This post originally appeared on <a href="http://solveforinteresting.com/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it/">Solve for Interesting</a>.  This version has been lightly edited.</em></p>
<div style="float: left; border-top: thin gray solid; border-bottom: thin gray solid; padding: 20px; margin: 20px 2px; clear: both;"><a href="https://en.oreilly.com/stratany2012/public/regwith/RADAR20?intcmp=il-strata-stny12-alistair-data-civil-rights"><img style="float: left; border: none; padding-right: 10px;" src="http://cdn.oreilly.com/radar/images/promos/2012-strata-ny-promo.gif" /></a><a href="https://en.oreilly.com/stratany2012/public/regwith/RADAR20?intcmp=il-strata-stny12-alistair-data-civil-rights"><strong>Strata Conference + Hadoop World</strong></a> &mdash;  The O&#8217;Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World.</p>
<p><a href="https://en.oreilly.com/stratany2012/public/regwith/RADAR20?intcmp=il-strata-stny12-alistair-data-civil-rights"><strong>Save 20% on registration with the code RADAR20</strong></a></div>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://solveforinteresting.com/private-data-in-public-ways/">Private Data In Public Ways</a></li>
<li> <a href="http://radar.oreilly.com/2011/09/cooking-the-data.html">Cooking the data</a></li>
<li> <a href="http://radar.oreilly.com/2011/08/theres-no-such-thing-as-big-da.html">There&#8217;s no such thing as big data</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/08/big-data-is-our-generations-civil-rights-issue-and-we-dont-know-it.html/feed</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>Survey results: How businesses are adopting and dealing with data</title>
		<link>http://radar.oreilly.com/2012/01/enterprise-big-data-survey-results.html</link>
		<comments>http://radar.oreilly.com/2012/01/enterprise-big-data-survey-results.html#comments</comments>
		<pubDate>Mon, 23 Jan 2012 18:30:00 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@editpick]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data company]]></category>
		<category><![CDATA[data product]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[strata]]></category>
		<category><![CDATA[strata online conference]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2012/01/enterprise-big-data-survey-results.html</guid>
		<description><![CDATA[Feedback from a recent Strata Online Conference suggests there&apos;s a large demand for clear information on what big data is and how it will change business. ]]></description>
				<content:encoded><![CDATA[<p>On December 7, 2011, we held our <a href="http://strataconf.com/strata-dec2011">fifth Strata Online Conference</a>. This series of free web events brings together analysts, innovators and researchers from a variety of fields. Each conference, we look at a particular facet of the move to big data &mdash; from personal analytics, to disruptive startups, to enterprise adoption.</p>
<p>This time, we focused on how businesses are going to embrace big data, and where the challenges lie. It was a perfect opportunity to survey the attendees and get a glimpse into enterprise adoption of big data. Out of the roughly 350 attendees, approximately 100 agreed to give us their feedback on a number of questions we asked. Here are the results.</p>
<h2>Some basic facts</h2>
<p>While the attendees worked for a mix of commercial, educational, government, and non-profit companies, the vast majority (82%) worked for a commercial, for-profit company.</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/1-OLC5-what-kind-org.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/1-OLC5-what-kind-580.jpg" border="0" alt="What kind of organization do you work for?" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/1-OLC5-what-kind-org.jpg">Click to enlarge</a>.</p>
<p>Most of the attendees&#8217; organizations were also fairly large &mdash; more than half of them had 500 co-workers, and 22% of them had more than 10,000.</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/2-OLC5-how-big.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/2-OLC5-how-big-580.jpg" border="0" alt="How big is your organization?" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/2-OLC5-how-big.jpg">Click to enlarge</a>.</p>
<p>We used this demographic information to segment and better analyze the other three questions we asked.</p>
<h2>Big data adoption and challenges</h2>
<p>We then asked attendees about their journey to big data. Fewer than 20% of them already have a big data solution in place &mdash; which we clarified to mean some kind of massive-scale, sharded, NoSQL, parallel data query system that may employ interactivity and machine-assisted data exploration. More than a quarter said they have no plans at this time.</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/3-OLC5-how-soon.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/3-OLC5-how-soon-580.jpg" border="0" alt="How soon do you expect to implement a big data solution?" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/3-OLC5-how-soon.jpg">Click to enlarge</a>.</p>
<p>While it&#8217;s relatively early days for adoption, more than 60% of attendees said they were in the process of gathering information on big data and what it meant to them. This is a spurious result at best: we&#8217;re of course selecting an audience that wants to be an audience. Nevertheless, the volume of attendees and their feedback suggests that deployment is ramping up: if you&#8217;re a big data vendor, this is the time to be fighting for mindshare.</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/4-OLC5-biggest-challenge.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/4-OLC5-biggest-challenge-580.jpg" border="0" alt="What's the biggest challenge you see with big data?" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/4-OLC5-biggest-challenge.jpg">Click to enlarge</a>.</p>
<p>When it comes to actually deploying big data, companies have plenty of challenges. The big ones seem to be:</p>
<ul>
<li> Data privacy and governance.</li>
<li> Defining what big data actually is.</li>
<li> Integrating big data with legacy systems.</li>
<li> A lack of big data skills.</li>
<li> The cost of tools.</li>
</ul>
<h2>Analyzing a bit further</h2>
<p>These results might be informative, but what we really want to know is how they correlate. After all, Strata is a data conference: we&#8217;d be remiss if we didn&#8217;t crunch things a bit!</p>
<p>First, we wondered whether there&#8217;s a relationship between the size of a company and the kinds of problems it&#8217;s experiencing with big data.</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/5-OLC5-obstacles.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/5-OLC5-obstacles-580.jpg" border="0" alt="Obstacles by company size" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/5-OLC5-obstacles.jpg">Click to enlarge</a>.</p>
<p>Our results suggest that governance and skill shortages are problems for larger companies, and that smaller businesses worry much less about data privacy and integrating legacy systems. Cost concerns come largely from mid-sized businesses.</p>
<p>Then we wondered whether adoption is tied to company size.</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/6-OLC5-adoption.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/6-OLC5-adoption-580.jpg" border="0" alt="Big data adoption progress by company size" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/6-OLC5-adoption.jpg">Click to enlarge</a>.</p>
<p>Among our attendees, smaller firms were ahead of the game: none of the companies larger than 500 employees said they had big data in place today.</p>
<p>We also found that educational, government, and NGO respondents didn&#8217;t list cost as a top concern, suggesting that they may have a tolerance for open-source or home-grown approaches.</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/7-OLC5-obst-by-comp.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/7-OLC5-obst-by-comp-580.jpg" border="0" alt="Obstacles by company type" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/7-OLC5-obst-by-comp.jpg">Click to enlarge</a>.</p>
<p>Of course, the number of responses from these segments isn&#8217;t statistically significant, but it warrants further study, particularly for commercial offerings trying to sell outside the for-profit world.</p>
<p>Finally, we wondered whether the things a company worries about change as it goes from &#8220;just browsing&#8221; to &#8220;trying to build.&#8221;</p>
<p class="image-box-580">
<a href="http://s.radar.oreilly.com/2012/01/23/8-OLC5-obst-by-type.jpg"><img src="http://s.radar.oreilly.com/2012/01/23/8-OLC5-obst-by-type-580.jpg" border="0" alt="Obstacles by time to implement" width="580" style="margin-bottom: 15px" /></a><br /><a href="http://s.radar.oreilly.com/2012/01/23/8-OLC5-obst-by-type.jpg">Click to enlarge</a>.</p>
<p>Concerns do seem to shift over the course of adoption and maturity. Early on, companies struggle to define what big data is and worry about staffing. As they get closer to implementation, their attention shifts to legacy system integration. Once they have a system, talent shortages and a variety of other, more specific concerns emerge.</p>
<p>While not a hard-core study &mdash; respondents weren&#8217;t randomly selected, the number of responses within some segments isn&#8217;t statistically significant, and so on &mdash; this feedback does suggest that there&#8217;s a large demand for clear information on what big data is and how it&#8217;ll change business, and that as enterprises move to adopt these technologies they&#8217;ll face integration headaches and staffing issues.</p>
<p>The next <a href="http://strataconf.com/strata-jan2012?cmp=il-radar-webcast-strata-olc5-survey-business-data">free Strata Online Conference will be held on January 25</a>. We&#8217;ll be taking a look at what&#8217;s in store for the upcoming <a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-strata-olc5-survey-business-data">Strata Conference</a> (Feb 28-March 1 in Santa Clara, Calif).</p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-strata-olc5-survey-business-data"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/2011-strata-ca-promo.png" /></a><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-strata-olc5-survey-business-data"><strong>Strata 2012</strong></a> &mdash;  The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.</p>
<p><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-strata-olc5-survey-business-data"><strong>Save 20% on registration with the code RADAR20</strong></a></div>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2012/01/what-is-big-data.html">What is big data?</a></li>
<li> <a href="http://radar.oreilly.com/2011/08/theres-no-such-thing-as-big-da.html">There&#8217;s no such thing as big data</a></li>
<li> <a href="http://radar.oreilly.com/2011/09/building-data-science-teams.html">Building data science teams</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/01/enterprise-big-data-survey-results.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The feedback economy</title>
		<link>http://radar.oreilly.com/2012/01/the-feedback-economy.html</link>
		<comments>http://radar.oreilly.com/2012/01/the-feedback-economy.html#comments</comments>
		<pubDate>Wed, 04 Jan 2012 14:00:00 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@editpick]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[@top]]></category>
		<category><![CDATA[data company]]></category>
		<category><![CDATA[data decisions]]></category>
		<category><![CDATA[data product]]></category>
		<category><![CDATA[feedback]]></category>
		<category><![CDATA[loop]]></category>
		<category><![CDATA[ooda]]></category>
		<category><![CDATA[operations]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2012/01/the-feedback-economy.html</guid>
		<description><![CDATA[We&apos;re moving beyond an information economy. The efficiencies and optimizations that come from constant and iterative feedback will soon become the norm for businesses and governments.  ]]></description>
				<content:encoded><![CDATA[<p>Military strategist <a href="http://en.wikipedia.org/wiki/John_Boyd_(military_strategist)#The_OODA_Loop">John Boyd</a> spent a lot of time understanding how to win battles. Building on his experience as a fighter pilot, he broke down the process of observing and reacting into something called an Observe, Orient, Decide, and Act (<a href="http://en.wikipedia.org/wiki/OODA_loop">OODA</a>) loop. Combat, he realized, consisted of observing your circumstances, orienting yourself to your enemy&#8217;s way of thinking and your environment, deciding on a course of action, and then acting on it.</p>
<p class="image-box-580"><a href="http://s.radar.oreilly.com/2012/01/03/0112-ooda-lg.png"><img src="http://s.radar.oreilly.com/2012/01/03/0112-ooda-580.png" border="0" alt="OODA chart" width="580" style="margin-bottom: 15px" /></a><br /><em>The Observe, Orient, Decide, and Act (OODA) loop. <a href="http://s.radar.oreilly.com/2012/01/03/0112-ooda-lg.png">Click to enlarge</a>.</em></p>
<p>The most important part of this loop isn&#8217;t included in the OODA acronym, however. <strong>It&#8217;s the fact that it&#8217;s a loop</strong>. The results of earlier actions feed back into later, hopefully wiser, ones. Over time, the fighter &#8220;gets inside&#8221; their opponent&#8217;s loop, outsmarting and outmaneuvering them. The system learns.</p>
<p>Boyd&#8217;s genius was to realize that winning requires two things: being able to collect and analyze information better, and being able to act on that information faster, incorporating what&#8217;s learned into the next iteration. Today, what Boyd learned in a cockpit applies to nearly everything we do.</p>
<h2 id="data-obese">Data-obese, digital-fast</h2>
<p>In our always-on lives we&#8217;re flooded with cheap, abundant information. We need to capture and analyze it well, separating digital wheat from digital chaff, identifying meaningful undercurrents while ignoring meaningless social flotsam. Clay Johnson <a href="http://radar.oreilly.com/2011/11/information-overload-overconsumption-diet.html">argues</a> that we need to go on <a href="http://www.informationdiet.com/">an information diet</a>, and makes a good case for conscious consumption. In an era of information obesity, we need to eat better. There&#8217;s a reason they call it a feed, after all.</p>
<p>It&#8217;s not just an overabundance of data that makes Boyd&#8217;s insights vital. In the last 20 years, much of human interaction has shifted from atoms to bits. When interactions become digital, they become instantaneous, interactive, and easily copied. It&#8217;s as easy to tell the world as to tell a friend, and a day&#8217;s shopping is reduced to a few clicks.</p>
<p>The move from atoms to bits reduces the coefficient of friction of entire industries to zero. Teenagers shun e-mail as too slow, opting for instant messages. The digitization of our world means that trips around the OODA loop happen faster than ever, and continue to accelerate.</p>
<p>We&#8217;re drowning in data. Bits are faster than atoms. Our jungle-surplus wetware can&#8217;t keep up. At least, not without Boyd&#8217;s help.<br />
In a society where every person, tethered to their smartphone, is both a sensor and an end node, we need better ways to observe and orient, whether we&#8217;re at home or at work, solving the world&#8217;s problems or planning a play date. And we need to be constantly deciding, acting, and experimenting, feeding what we learn back into future behavior.</p>
<p>We&#8217;re entering a feedback economy.</p>
<h2 id="supply-chain">The big data supply chain</h2>
<p>Consider how a company collects, analyzes, and acts on data.</p>
<p class="image-box-580"><a href="http://s.radar.oreilly.com/2012/01/03/0112-big-data-supply-chain-lg.png"><img src="http://s.radar.oreilly.com/2012/01/03/0112-big-data-supply-chain-580.png" border="0" alt="The big data supply chain" style="margin-bottom: 15px" /></a><br /><em>The big data supply chain. <a href="http://s.radar.oreilly.com/2012/01/03/0112-big-data-supply-chain-lg.png">Click to enlarge</a>.</em></p>
<p>Let&#8217;s look at these components in order.</p>
<h3>Data collection</h3>
<p>The first step in a data supply chain is to get the data in the first place.</p>
<p>Information comes in from a variety of sources, both public and private. We&#8217;re a promiscuous society online, and with the advent of low-cost data marketplaces, it&#8217;s possible to get nearly any nugget of data relatively affordably. From social network sentiment, to weather reports, to economic indicators, public information is grist for the big data mill. Alongside this, we have organization-specific data such as retail traffic, call center volumes, product recalls, or customer loyalty indicators.</p>
<p>The legality of collection is perhaps more restrictive than getting the data in the first place. Some data is heavily regulated &mdash; HIPAA governs healthcare, while PCI restricts financial transactions. In other cases, the act of combining data may be illegal because it generates personally identifiable information (PII). For example, courts have ruled differently on whether IP addresses aren&#8217;t PII, and the California Supreme Court ruled that zip codes are. Navigating these regulations imposes some serious constraints on what can be collected and how it can be combined.</p>
<p>The era of ubiquitous computing means that everyone is a potential source of data, too. A modern smartphone can sense light, sound, motion, location, nearby networks and devices, and more, making it a perfect data collector. As consumers opt into loyalty programs and install applications, they become sensors that can feed the data supply chain.</p>
<p>In big data, the collection is often challenging because of the sheer volume of information, or the speed with which it arrives, both of which demand new approaches and architectures.</p>
<h3>Ingesting and cleaning</h3>
<p>Once the data is collected, it must be ingested. In traditional business intelligence (BI) parlance, this is known as Extract, Transform, and Load (ETL): the act of putting the right information into the correct tables of a database schema and manipulating certain fields to make them easier to work with.</p>
<p>One of the distinguishing characteristics of big data, however, is that the data is often unstructured. That means we don&#8217;t know the inherent schema of the information before we start to analyze it. We may still transform the information &mdash; replacing an IP address with the name of a city, for example, or anonymizing certain fields with a one-way hash function &mdash; but we may hold onto the original data and only define its structure as we analyze it.</p>
<h3>Hardware</h3>
<p>The information we&#8217;ve ingested needs to be analyzed by people and machines. That means hardware, in the form of computing, storage, and networks. Big data doesn&#8217;t change this, but it does change how it&#8217;s used. Virtualization, for example, allows operators to spin up many machines temporarily, then destroy them once the processing is over.</p>
<p>Cloud computing is also a boon to big data. Paying by consumption destroys the barriers to entry that would prohibit many organizations from playing with large datasets, because there&#8217;s no up-front investment. In many ways, big data gives clouds something to do.</p>
<h3>Platforms</h3>
<p>Where big data is new is in the platforms and frameworks we create to crunch large amounts of information quickly. One way to speed up data analysis is to break the data into chunks that can be analyzed in parallel. Another is to build a pipeline of processing steps, each optimized for a particular task.</p>
<p>Big data is often about fast results, rather than simply <a href="http://www.wired.com/cloudline/2011/11/big-data-fast-slow/">crunching a large amount of information</a>. That&#8217;s important for two reasons:</p>
<ol>
<li> Much of the big data work going on today is related to user interfaces and the web. Suggesting what books someone will enjoy, or delivering search results, or finding the best flight, requires an answer in the time it takes a page to load. The only way to accomplish this is to spread out the task, which is one of the reasons why Google has <a href="http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/">nearly a million servers</a>.</li>
<li> We analyze unstructured data iteratively. As we first explore a dataset, we don&#8217;t know which dimensions matter. What if we segment by age? Filter by country? Sort by purchase price? Split the results by gender? This kind of &#8220;what if&#8221; analysis is exploratory in nature, and analysts are only as productive as their ability to explore freely. Big data may be big. But if it&#8217;s not fast, it&#8217;s unintelligible.</li>
</ol>
<p>Much of the hype around big data companies today is a result of the retooling of enterprise BI. For decades, companies have relied on structured relational databases and data warehouses &mdash; many of them can&#8217;t handle the exploration, lack of structure, speed, and massive sizes of big data applications.</p>
<h3>Machine learning</h3>
<p>One way to think about big data is that it&#8217;s &#8220;more data than you can go through by hand.&#8221; For much of the data we want to analyze today, we need a machine&#8217;s help.</p>
<p>Part of that help happens at ingestion. For example, natural language processing tries to read unstructured text and deduce what it means: Was this Twitter user happy or sad? Is this call center recording good, or was the customer angry?</p>
<p>Machine learning is important elsewhere in the data supply chain. When we analyze information, we&#8217;re trying to find signal within the noise, to discern patterns. Humans can&#8217;t find signal well by themselves. Just as astronomers use algorithms to scan the night&#8217;s sky for signals, then verify any promising anomalies themselves, so too can data analysts use machines to find interesting dimensions, groupings, or patterns within the data. Machines can work at a lower signal-to-noise ratio than people.</p>
<h3>Human exploration</h3>
<p>While machine learning is an important tool to the data analyst, there&#8217;s no substitute for human eyes and ears. Displaying the data in human-readable form is hard work, stretching the limits of multi-dimensional visualization. While most analysts work with spreadsheets or simple query languages today, that&#8217;s changing.</p>
<p><a href="http://strataconf.com/strata2011/public/schedule/speaker/102881">Creve Maples</a>, an early advocate of better computer interaction, designs systems that take dozens of independent, data sources and displays them in navigable 3D environments, complete with sound and other cues. Maples&#8217; studies show that when we feed an analyst data in this way, they can often find answers in minutes instead of months.</p>
<p>This kind of interactivity requires the speed and parallelism explained above, as well as new interfaces and multi-sensory environments that allow an analyst to work alongside the machine, immersed in the data.</p>
<h3>Storage</h3>
<p>Big data takes a lot of storage. In addition to the actual information in its raw form, there&#8217;s the transformed information; the virtual machines used to crunch it; the schemas and tables resulting from analysis; and the many formats that legacy tools require so they can work alongside new technology. Often, storage is a combination of cloud and on-premise storage, using traditional flat-file and relational databases alongside more recent, post-SQL storage systems.</p>
<p>During and after analysis, the big data supply chain needs a warehouse. Comparing year-on-year progress or changes over time means we have to keep copies of everything, along with the algorithms and queries with which we analyzed it.</p>
<h3>Sharing and acting</h3>
<p>All of this analysis isn&#8217;t much good if we can&#8217;t act on it. As with collection, this isn&#8217;t simply a technical matter &mdash; it involves legislation, organizational politics, and a willingness to experiment. The data might be shared openly with the world, or closely guarded.</p>
<p>The best companies tie big data results into everything from hiring and firing decisions, to strategic planning, to market positioning. While it&#8217;s easy to buy into big data technology, it&#8217;s far harder to shift an organization&#8217;s culture. In many ways, big data adoption isn&#8217;t a hardware retirement issue, it&#8217;s an employee retirement one.</p>
<p>We&#8217;ve seen similar resistance to change each time there&#8217;s a big change in information technology. Mainframes, client-server computing, packet-based networks, and the web all had their detractors. A NASA study into the failure of Ada, the first object-oriented language, concluded that proponents had over-promised, and there was a lack of a supporting ecosystem to help the new language flourish. Big data, and its close cousin, cloud computing, are likely to encounter similar obstacles.</p>
<p>A big data mindset is one of experimentation, of taking measured risks and assessing their impact quickly. It&#8217;s similar to the <a href="http://theleanstartup.com/">Lean Startup</a> movement, which advocates fast, iterative learning and tight links to customers. But while a small startup can be lean because it&#8217;s nascent and close to its market, a big organization needs big data and an OODA loop to react well and iterate fast.</p>
<p>The big data supply chain is the organizational OODA loop. It&#8217;s the big business answer to the lean startup.</p>
<h3>Measuring and collecting feedback</h3>
<p>Just as John Boyd&#8217;s OODA loop is mostly about the loop, so big data is mostly about feedback. Simply analyzing information isn&#8217;t particularly useful. To work, the organization has to choose a course of action from the results, then observe what happens and use that information to collect new data or analyze things in a different way. It&#8217;s a process of continuous optimization that affects every facet of a business. </p>
<h2 id="replacing">Replacing everything with data</h2>
<p>Software is <a href="http://online.wsj.com/article/SB10001424053111903480904576512250915629460.html">eating the world</a>. Verticals like publishing, music, real estate and banking once had strong barriers to entry. Now they&#8217;ve been entirely disrupted by the elimination of middlemen. The last film projector <a href="http://www.salon.com/2011/10/13/r_i_p_the_movie_camera_1888_2011/">rolled off the line in 2011</a>: movies are now digital from camera to projector. The Post Office stumbles because nobody writes letters, even as Federal Express becomes the planet&#8217;s supply chain.</p>
<p>Companies that get themselves on a feedback footing will dominate their industries, building better things faster for less money. Those that don&#8217;t are already the walking dead, and will soon be little more than case studies and colorful anecdotes. Big data, new interfaces, and ubiquitous computing are tectonic shifts in the way we live and work.</p>
<h2 id="feedback-economy">A feedback economy</h2>
<p>Big data, continuous optimization, and replacing everything with data pave the way for something far larger, and far more important, than simple business efficiency. They usher in a new era for humanity, with all its warts and glory. They herald the arrival of the feedback economy.</p>
<p>The efficiencies and optimizations that come from constant, iterative feedback will soon become the norm for businesses and governments. We&#8217;re moving beyond an information economy. Information on its own isn&#8217;t an advantage, anyway. Instead, this is the era of the feedback economy, and Boyd is, in many ways, the first feedback economist.</p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-feedback-economy"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/2011-strata-ca-promo.png" /></a><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-feedback-economy"><strong>Strata 2012</strong></a> &mdash;  The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.</p>
<p><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-feedback-economy"><strong>Save 20% on registration with the code RADAR20</strong></a></div>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2011/11/big-data-business-enterprise.html">Big data goes to work</a></li>
<li> <a href="http://radar.oreilly.com/2011/08/building-data-startups.html">Building data startups: Fast, big, and focused</a></li>
<li> <a href="http://shop.oreilly.com/product/0636920022640.do">Big Data Now</a> (free anthology)</li>
<li> <a href="http://radar.oreilly.com/2011/11/information-overload-overconsumption-diet.html">Don&#8217;t blame the information for your bad habits</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/01/the-feedback-economy.html/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Cooking the data</title>
		<link>http://radar.oreilly.com/2011/09/cooking-the-data.html</link>
		<comments>http://radar.oreilly.com/2011/09/cooking-the-data.html#comments</comments>
		<pubDate>Tue, 20 Sep 2011 22:30:00 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@editpick]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[law]]></category>
		<category><![CDATA[transparency]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/09/cooking-the-data.html</guid>
		<description><![CDATA[Open data and transparency aren&apos;t enough: we need True Data, not Big Data, as well as regulators and lawmakers willing to act on it. ]]></description>
				<content:encoded><![CDATA[<p>At this week&#8217;s <a href="http://strataconf.com">Strata Conference</a> in New York, there&#8217;s a lot of discussion about data transparency. As masses of easily available, quickly analyzed data transform businesses, that data can also change how we regulate and legislate the world.</p>
<p>Data transparency holds promise. It should, in theory, weed out corruption and level the playing field. Rather than regulating what a company can do, for example, we can regulate what it must share with the world &#8212; and then let the world deal with the consequences, whether by boycott, activism, or class-action lawsuit. It&#8217;s something the Leading Edge Forum&#8217;s <a href="http://strataconf.com/jumpstart2011/public/schedule/speaker/66229">Michael Nelson</a> described as a form of digital libertarianism: pacts of transparency between businesses and consumers, or between governments and citizens. He calls it &#8220;Mutually Assured Disclosure.&#8221;</p>
<p>It&#8217;s certainly encouraging to think that corruption and shenanigans wither under the harsh light of data. With information out in the open, it should be easy for interested parties to review the numbers &#8212; using cheap clouds and intuitive visualizations &#8212; and spot the cheaters.</p>
<h2>Does data really blow its own whistle?</h2>
<p><img src="http://s.radar.oreilly.com/2011/09/21/0911-greece-goog-map.jpg" alt="Google Maps, Greece" width="300" style="float: right;margin: 3px 0 10px 10px" />The first problem open data advocates run into is that of getting real information. Look at Greece: 324 Athenians reported having swimming pools on their taxes. When the government used Google Maps to try and count how many there really were, <a href="http://www.nytimes.com/2010/05/02/world/europe/02evasion.html?hp=&amp;pagewanted=all">they found 16,974 of them</a> &#8212; despite <a href="http://www.telegraph.co.uk/news/worldnews/europe/greece/7664764/Revolution-from-Greeces-ruins-as-crisis-deepens.html">efforts by citizens to camouflage their pools</a> under green tarpaulins. So even if activists <em>can</em> use widely available data to create change, that data may be wrong.</p>
<p>One way around this is to get your own data. The barriers to data collection have vanished with the advent of social networks, ubiquitous computing, and other innovations. Just as Greek tax officials can use Google Earth to understand tax evasion, so organizations like <a href="http://asthmapolis.com/">Asthmapolis</a> can crowdsource data &#8212; in this case, by attaching GPS receivers to asthma inhalers &#8212; and use the information to shape public policy.</p>
<h2>Can we tell when the data is wrong?</h2>
<p>Data in hand, it needs to be properly analyzed. That&#8217;s not as easy as it sounds.</p>
<p>With software development, it&#8217;s easy to see the results. If the coder&#8217;s work isn&#8217;t effective, the finished product is buggy, unusable, slow, and incompatible. On the other hand, a lazy data scientist produces wrong results that may not be obvious to anyone. Detecting fraud or error in data sets can be tough. At Strata Summit, LinkedIn&#8217;s <a href="http://strataconf.com/summit2011/public/schedule/speaker/109152">Monica Rogati</a> highlighted a number of common errors that analysts make when interpreting and reporting their research; as more and more people start to work with numbers, more and more make mistakes. Statistics is often counter-intuitive. (Want a good example? Try the <a href="http://www.nytimes.com/2008/04/08/science/08monty.html">Monty Hall problem</a>.)</p>
<p>Will we know if we&#8217;ve got bad data, whether from malice, omission, or accident? It&#8217;s possible to detect fraud in some cases. Modeling datasets often reveals problems with the data, and statisticians have tricks that can help. <a href="http://www.guardian.co.uk/commentisfree/2011/sep/16/bad-science-dodgy-stats">Benford&#8217;s Law</a>, for example, says that &#8220;for naturally occurring data, you get more ones than twos, more twos than threes, and so on, all the way down to nine.&#8221; Point the law certain kinds of datasets, and you know how likely it is that the contents are a lie.</p>
<h2>Will we act on it?</h2>
<p>Open data is no good unless it leads to action. Most proponents of transparency believe that change logically flows from proof. In government, at least, current public policy suggests otherwise. On critical global issues like climate science and evolution, despite overwhelming, peer-reviewed data, we&#8217;re still deadlocked on whether to <a href="http://en.wikipedia.org/wiki/Teach_the_Controversy">teach creationism</a>, or <a href="http://twitter.com/#!/ChuckGrassley/status/116115784140455936">whether climate change is real</a>. Don&#8217;t like the numbers you&#8217;re getting? <a href="http://news.sciencemag.org/scienceinsider/2011/02/scientists-criticize-house-vote-.html">Call them corrupt.</a> Threaten to take away funding. If the infographic is the new stump speech, then questioning the data is the new rebuttal.</p>
<p>Simply having transparency doesn&#8217;t lead to change. Without effective checks and balances, and without real punishments, shining the harsh light of data won&#8217;t do anything. This makes class action lawyers and hacktivists unlikely allies: Lawsuits, social media campaigns, and boycotts are often the only way to induce change from otherwise unregulated industries.</p>
<p>Data transparency is an arms race. In a world of full disclosure, cooking the data is the new cooking the books. How many of today&#8217;s data scientists will become tomorrow&#8217;s forensic accountants, locked in a war with the fraudulent and the ignorant? Open data and transparency aren&#8217;t enough: we need True Data, not Big Data, as well as regulators and lawmakers willing to act on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/09/cooking-the-data.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Meat to Math ratio</title>
		<link>http://radar.oreilly.com/2011/08/meat-to-math-ratio.html</link>
		<comments>http://radar.oreilly.com/2011/08/meat-to-math-ratio.html#comments</comments>
		<pubDate>Thu, 18 Aug 2011 13:00:00 +0000</pubDate>
		<dc:creator>Alistair Croll</dc:creator>
				<category><![CDATA[Web 2.0]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[ipo]]></category>
		<category><![CDATA[lean startup]]></category>
		<category><![CDATA[market cap]]></category>
		<category><![CDATA[Meat-to-math]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[valuation]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/08/meat-to-math-ratio.html</guid>
		<description><![CDATA[Successful companies find ways to augment their employees, allowing them to operate at scale with customers. Big data, machine learning, and an iterative, experimental mindset are essential &#8212; and increasingly, company valuations are tied to the efficiency with which firms put information to work. ]]></description>
				<content:encoded><![CDATA[<p>As we enter one of the biggest tech IPO seasons in recent history &#8212; LinkedIn, with Groupon, Pandora, Zillow, Dropbox, Zynga, and CafePress all lining up behind it &#8212; it&#8217;s hard to know what will fly and what will flounder.</p>
<p>One indicator of a company&#8217;s potential is how well it can scale its business independent of human intervention. This isn&#8217;t simply the ability to automate tasks or replace workers with machines; rather, it&#8217;s <em>the ability to augment people with data and processes.</em></p>
<p>Call it the Meat to Math ratio.</p>
<h2>First, a comparison</h2>
<p>To explain what I mean, I&#8217;ve done some back-of-the-napkin math on six companies. Four are public, and two have impending IPOs. Of the four public ones, two are disruptors and two are the established incumbents they&#8217;re beating to a fiscal pulp.</p>
<p>One common way to measure a company&#8217;s productivity is the revenue per employee.</p>
<p class="image-box-580"><img src="http://s.radar.oreilly.com/2011/08/17/0811-meatmath-chart1.png" border="0" alt="Revenue per employee across six companies" width="580" /></p>
<p>It&#8217;s not just about revenue per employee, though. As <a href="http://www.strassmann.com/pubs/cw/headcount.shtml">Paul Strassman said</a> before the first dot-com bubble in 1998, we shouldn&#8217;t give industrial-age answers to information-age questions. Rather, it&#8217;s about how well a company can leverage its employees over the long term. Companies with a good meat-to-math ratio should be able to do things like:</p>
<ul>
<li> Automating processes at scale.</li>
<li> Maintaining genuine interactions with their customers despite a high number of customers per employee.</li>
<li> Finding new businesses from their own data exhaust through introspection and experimentation.</li>
</ul>
<p>I want to look at each of these three in more detail.</p>
<h2>Turking, then automating</h2>
<p>I&#8217;ve spent the last year looking at a lot of new ventures, partly because of my involvement in a startup accelerator. <a href="http://www.yearonelabs.com">Our accelerator</a> uses <a href="http://theleanstartup.com/">lean startup</a> methodologies. These are techniques for pushing the uncertainty to the front of the company&#8217;s lifespan. Rather than getting your investment and business in place before launching, a lean model is all about doing the least amount of work to accurately predict whether a particular business will succeed. Then it&#8217;s about iterating quickly to a fit between a set of product features and a target market. It&#8217;s not a perfect science, but it&#8217;s a good way to avoid losing a lot of money on a bad idea.</p>
<p>One lean startup trick is doing things by hand rather than wasting time programming. Consider, for example, that you&#8217;re thinking of launching a search-by-email company. Rather than coding everything, you&#8217;d read users&#8217; emails, search for them, and respond in an email. You&#8217;d soon find out whether people wanted to search by email, without investing time in natural language parsing, email handling, and so on. You&#8217;d be &#8220;turking,&#8221; a term that refers to the <a href="http://en.wikipedia.org/wiki/The_Turk">Turk</a> (from which <a href="https://www.mturk.com/mturk/welcome">Amazon&#8217;s people-as-a-function-call service</a> gets its name.)</p>
<p>Turking takes many forms. It might mean drawing rudimentary user interfaces, then watching someone &#8220;use&#8221; them with their finger (a process called paper prototyping). Or it might mean replacing some complex function with a human (what we jokingly refer to as a Flesh-Based API). Or maybe it&#8217;s creating landing pages for applications that don&#8217;t exist, to see who signs up. One of our incubated companies didn&#8217;t code for a month. Instead, they ran surveys and did customer development until they found something people cared about.</p>
<p><em>Early on, meat is cheaper than math.</em></p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px"><a href="http://strataconf.com/public/content/landing?_discount=strata&amp;cmp=il-radar-st11-meat-to-math-ratio"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/strata-ny-stn11rad.png" /></a><a href="http://strataconf.com/public/content/landing?_discount=strata&amp;cmp=il-radar-st11-meat-to-math-ratio"><strong>Strata Conference New York 2011</strong></a>, being held Sept. 22-23, covers the latest and best tools and technologies for data science &#8212; from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.</p>
<p><a href="http://strataconf.com/public/content/landing?_discount=strata&amp;cmp=il-radar-st11-meat-to-math-ratio"><strong>Save 30% on registration with the code STN11RAD</strong></a></div>
<p>But if a company can&#8217;t make a transition to math, it will have to turk at scale. Turking at scale is another way of describing the dirty business of managing people, with all of the chaos, uncertainty, and HR headaches it entails.</p>
<p>There&#8217;s an old adage among investors that a change in order of magnitude means a change in leadership. If a company goes from $10M a year to $100M a year in revenue, it&#8217;s time for a new leader. Similarly, if it goes from 100 to 1,000 customers, or from 10 to 100 employees, something has to change. Those customers and employees are meat, and meat is hard to scale.</p>
<p>Cloud computing and virtualization are a move from atoms to bits, replacing rack-and-stack with click-and-drag, and the resulting increases in IT productivity and server-to-administrator ratios are impressive. If you move to a virtualized, properly orchestrated IT architecture, you can survive an order-of-magnitude increase without throwing meat at the problem.</p>
<p>Put another way, meat is how you scale atoms &mdash; the kind that make up brick and mortar. Math is how you scale bits &mdash; the kind that make up big data businesses. Businesses that can scale bits are interesting, because bits don&#8217;t have the coefficient of friction that atoms do.</p>
<h2>Being genuine to the masses</h2>
<p>There&#8217;s a book called &#8220;<a href="http://books.google.com/books/about/The_clustering_of_America.html?id=UNADAQAAIAAJ">The Clustering Of America</a>.&#8221; At one time, it was the Bible of marketing. It broke down, zip-code by zip-code, the population of the United States. It clustered people into simple and almost laughably stereotypical groups like &#8220;Blue-blood Estates,&#8221; &#8220;Dodge Diplomats,&#8221; and &#8220;Towns &amp; Gowns.&#8221;</p>
<p>It was the perfect book for a &#8220;Mad Men&#8221; era, where a pithy slogan and the right timeslot could open a million wallets. In the golden age of broadcasting, obedient audiences sat down as one around the family TV to watch a show at a time of the network&#8217;s choosing.</p>
<p>Today, that world is a fading memory. DVRs, iTunes, and streaming have freed us from the tyranny of the o&#8217;clock. They&#8217;ve also made it easy for us to find our niche programming. <a href="http://www.brucespringsteen.net/songs/57Channels.html">Bruce Springsteen didn&#8217;t think big enough</a>: We have 57 million channels, and everything&#8217;s on.</p>
<p>Traditional marketers hate this. They&#8217;re hung over, recovering from a cheap cocktail of one-to-many, broadcast media purchases. They like buying things in big chunks, aimed at homogeneous clusters.</p>
<p>By contrast, modern marketing is about attention and engagement. It&#8217;s about doing something interesting, and getting tailored messages to micro-markets that expect personal attention and engagement with the companies they love. Every brand has its Little Monsters, but unlike <a href="http://www.popeater.com/2010/02/03/lady-gaga-new-tattoos/">Lady Gaga</a>, legacy businesses don&#8217;t know how to interact with them.</p>
<p>Former Coca-Cola CMO <a href="http://en.wikipedia.org/wiki/Sergio_Zyman">Sergio Zyman</a> describes marketing as &#8220;selling more stuff to more people more often for more money more efficiently.&#8221; In other words, selling at scale. If modern, post-broadcast marketing is about being genuine, then marketers face the challenge of being genuine (meat) at scale (math).</p>
<p>Companies that can work this out will win. But it will take big data systems, next-generation customer relationship management, and machine learning to help augment front-line employees.</p>
<h2>Mining your own exhaust</h2>
<p>Netflix and Amazon have something in common beyond their destruction of incumbents: the ability to create new businesses from whole cloth while still generating revenue. Netflix managed to become the dominant paid streaming platform, using mail-based distribution to bootstrap itself. Amazon created a cloud service from what it learned about running large-scale IT infrastructure; introduced a digital reader that now outstrips book sales; and expanded from books into many other retail markets.</p>
<p>Blockbuster and Barnes &amp; Noble could have done these things, but they didn&#8217;t. Netflix and Amazon used their own data exhaust to innovate. They started new races in the middle of an existing one. As data-driven companies, they create volumes of data about their own operations, then recycle this into new insights and new businesses. Another reason for their agility is that meat has inertia: it&#8217;s hard and time-consuming to hire, fire, and retrain people; it&#8217;s easy to change an algorithm.</p>
<h2>Back to those companies</h2>
<p>So the meat-to-math ratio is vital for several things:</p>
<ul>
<li> Scaling the company without adding messy atoms and the related overhead.</li>
<li> Scaling marketing without becoming disconnected from markets or customers.</li>
<li> Iterating into new markets and new services adjacent to a core business.</li>
</ul>
<p>Comparing our six firms &#8212; four public, two soon to be &#8212; in this light, what does a good meat-to-math ratio mean for your business? Let&#8217;s look closely at the two impending IPOs: Groupon and Dropbox.</p>
<p>Groupon&#8217;s offering, initially valued at $30B, is taking a beating. It has 7,500 employees, and it&#8217;s adding them fast. Despite the hiring binge, Groupon is seeing a promising increase in word-of-mouth sales, which is a sustainable model for scaling the business, and cost of customer acquisition is a key measure of whether a company can survive.</p>
<p>But Groupon has to sell to two groups: consumers looking for deals, and merchants willing to offer them. Humans have to call on small businesses directly. There are other reasons for Groupon&#8217;s troubles &mdash; the company lacks sustainable barriers to entry, as shown by competitors like Google and LivingSocial &mdash; but the root of the issue is this: Groupon is throwing meat at the growth problem, when it should be throwing math at it.</p>
<p>Dropbox, on the other hand, has 74 employees (a number that&#8217;s also growing fast, but is a hundred times smaller than Groupon.) Its IPO is valued at $5B, and it reportedly accepted a lower valuation in order to go with the banker it wanted. Dropbox has customer acquisition built into its model, a viral-marketing scheme where users invite their friends in return for extra free storage.</p>
<p>Assuming that the market values math over meat, how would you expect these two companies to compare on valuation per capita?</p>
<p>The way we keep score, like it or not, is market capitalization or IPO valuation. Let&#8217;s compare these six companies&#8217; IPO values by company.</p>
<p class="image-box-580"><img src="http://s.radar.oreilly.com/2011/08/17/0811-meatmath-chart2.png" border="0" alt="Market cap per employee across six companies" width="580" /></p>
<p>Clearly, if you&#8217;re proving you can scale with math instead of meat, the market rewards you handsomely.</p>
<p>In a data-driven world, the true measure of any organization, from a regional government to a global conglomerate, is its meat-to-math ratio.  This sounds like a cold statement, saying machines are better than people. That&#8217;s not the point here: machines are better with people, and companies that can&#8217;t augment their employees with data and tools, that cling to antiquated ideas like broadcast, and that can&#8217;t turn their data exhaust into insight and innovation, are doomed.</p>
<h2>Showing my math</h2>
<p>The numbers I&#8217;ve used come from a variety of sources and time periods; they should be treated as illustrative, rather than hard data. In the interest of transparency, here&#8217;s how I got the data. If you have better numbers, I&#8217;d love to hear them.</p>
<p><strong>For Amazon:</strong> Amazon had $12.95B in Q410 revenues (This includes a variety of other revenues, most significantly non-media sales and computing services), and 33,700 employees, meaning a revenue per employee of $384,273. The company had a market cap of $91.8B in early August, 2011, and roughly 43,200 employees, for a value-per-employee of $2,125,694. Sources: <a href="http://www.google.com/finance?q=nasdaq:AMZN">Google Finance</a>; <a href="http://www.techflash.com/seattle/2011/07/amazon-head-count-swells.html">Techflash</a>, <a href="http://tech.blorge.com/Structure:%20/2011/01/28/amazon-achieves-record-sales-in-q4-2010-kindle-books-now-outselling-paperbacks/">Blorge</a>.</p>
<p><strong>For Barnes &amp; Noble:</strong> $1.91B in Q410 revenues, and 35,000 employees, meaning a revenue per employee of $54,571. The company had a market cap of $947M in early August, 2011, and roughly 30,000 employees, for a value-per-employee of $31,567. Sources: <a href="http://www.hoovers.com/company/Barnes__Noble_Inc/rjhryi-1-1njdap.html">Hoovers</a> says 30,000 in 2011, <a href="http://en.wikipedia.org/wiki/Barnes_%26_Noble">Wikipedia</a> says there were 40,000 employees in 2008, and <a href="http://zenobank.com/index.php?symbol=BKS&amp;page=quotesearch">Zenobank</a> says 35,000 today.</p>
<p><strong>For Netflix:</strong> $444M in Q409 revenues, and 1,000 employees, meaning a revenue-per-employee of $444,000. The company had a market cap of $12.8B in early August, 2011, and 1,000 employees, for a value-per-employee of $595.92M . Sources: <a href="http://www.homemediamagazine.com/netflix/netflix-tops-blockbuster-domestic-rental-revenue-first-time-18571">Home Media Magazine</a>; Netflix&#8217;s Adrian Cockroft tells me there are roughly 1,000 salaried contractors, plus hourly workers at distribution centres. I&#8217;m being conservative and using that same number for 2009, when it was certainly smaller.</p>
<p><strong>For Blockbuster:</strong> $400M in Q409 revenues. The company peaked at 60,000 employees in 2009; I&#8217;ve assumed 55,000 by Q4, meaning a revenue-per-employee of $7,273. The company is not currently trading. Sources: <a href="http://www.homemediamagazine.com/netflix/netflix-tops-blockbuster-domestic-rental-revenue-first-time-18571">Home Media Magazine</a>, <a href="http://money.usnews.com/money/blogs/flowchart/2009/02/06/15-companies-that-might-not-survive-2009">USNews.</a></p>
<p><strong>For Groupon:</strong> Q211 revenues were $878M, with roughly 7,500 employees, for a revenue-per-employee of $117,067. The company&#8217;s IPO filing was initially valued at $30B, but will likely be significantly lower; nevertheless, I&#8217;m using the original valuation. That means a value-per-employee of $4M. Sources: Groupon S-1 filing; <a href="http://www.businessinsider.com/3000-people-in-29-countries">Business Insider</a> says Groupon had 3,000 employees in Q4 2010, and is adding headcount aggressively. <a href="http://www.sbnonline.com/2011/08/groupon-cuts-marketing-costs-hiring-costs-jump/">SB Online says</a> hiring costs have jumped. And <a href="http://www.quora.com/How-many-employees-does-Groupon-have?q=how+many+employees+does+groupon+have">the best guess on Quora</a> puts the count at 7,500 employees.</p>
<p><strong>For Dropbox:</strong> In Q211 Dropbox had $25M in revenues, and 74 employees, for a revenue per employee of $338K. The company is <a href="http://techcrunch.com/2011/08/13/dropbox-chooses-investor-group-valuation-set-at-5-billion/">planning a $5B IPO</a>, which  means a value-per-employee ratio of $67.6M. Sources: There are 74 employees on <a href="https://www.dropbox.com/about">Dropbox&#8217;s website</a> (they list them all.), TechCrunch suggests that the company chose a lower valuation than they could have in order to get the right investment bank. <a href="http://www.businessinsider.com/dropbox-revenue-2011-3">Business Insider estimates 2011 revenues</a> at $100M total; I used 25 percent of these.</p>
<p></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2011/08/theres-no-such-thing-as-big-da.html">There&#8217;s no such thing as big data</a></li>
<li> <a href="http://radar.oreilly.com/2011/08/building-data-startups.html">Building data startups: Fast, big, and focused</a></li>
<li> <a href="http://radar.oreilly.com/2011/08/data-human-machine-analysis.html">Data and the human-machine connection<br />
</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/08/meat-to-math-ratio.html/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>