<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>O&#039;Reilly Radar &#187; Ciara Byrne</title>
	<atom:link href="http://radar.oreilly.com/ciarab/feed" rel="self" type="application/rss+xml" />
	<link>http://radar.oreilly.com</link>
	<description>Insight, analysis, and research about emerging technologies</description>
	<lastBuildDate>Fri, 17 May 2013 16:29:56 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>The hidden language and &quot;wonderful experience&quot; of product reviews</title>
		<link>http://radar.oreilly.com/2012/01/product-reviews-hidden-language.html</link>
		<comments>http://radar.oreilly.com/2012/01/product-reviews-hidden-language.html#comments</comments>
		<pubDate>Mon, 09 Jan 2012 14:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Web 2.0]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[comments]]></category>
		<category><![CDATA[data product]]></category>
		<category><![CDATA[ecommerce]]></category>
		<category><![CDATA[feedback]]></category>
		<category><![CDATA[Panagiotis Ipeirotis]]></category>
		<category><![CDATA[product reviews]]></category>
		<category><![CDATA[reviews]]></category>
		<category><![CDATA[sales]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[text analysis]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2012/01/product-reviews-hidden-language.html</guid>
		<description><![CDATA[How much is an Amazon review &#8212; good or bad &#8212; worth? Computer scientist and NYU professor Panagiotis Ipeirotis analyzed the text in thousands of Amazon reviews to find out. ]]></description>
				<content:encoded><![CDATA[<p>How do reviews, both positive and negative, influence the price of a product on Amazon? What phrases used by reviewers make us more or less likely to complete a purchase? These are some of the questions that computer scientist <a href="http://pages.stern.nyu.edu/~panos/">Panagiotis Ipeirotis</a>, an associate professor at New York University&#8217;s <a href="http://www.stern.nyu.edu/">Stern School of Business</a>, set out to investigate by <a href="http://pages.stern.nyu.edu/~panos/publications/kdd2007.pdf">analyzing</a> the text in thousands of reviews on Amazon. Ipeirotis continues to <a href="http://www.behind-the-enemy-lines.com/2011/04/want-to-improve-sales-fix-grammar-and.html">research this space</a>.</p>
<p>Ipeirotis&#8217; findings are surprising: consumers will pay more for the same product if the seller&#8217;s reviews are good, certain types of negative reviews actually boost sales, and spelling plays an important role.</p>
<p>Our interview follows.</p>
<h2>How important are product reviews on Amazon? Can they give sellers more pricing power?</h2>
<p><a href="http://strataconf.com/jumpstart2011/public/schedule/speaker/4490"><img src="http://assets.en.oreilly.com/1/eventprovider/1/_@user_4490.jpg" border="0" width="75" alt="http://assets.en.oreilly.com/1/eventprovider/1/_@user_4490.jpg" style="float: right;margin: 3px 0 10px 10px" /></a><strong>Panagiotis Ipeirotis:</strong> The reviews have a significant effect. When buying online, customers are not only purchasing the product, they&#8217;re also inherently buying the guarantee of a seamless transaction. Customers read the feedback left from other buyers to evaluate the reputation of the seller. Since customers are willing to pay more to buy from merchants with a better reputation &mdash; something we call the &#8220;reputation premium&#8221;  &mdash; that feedback tends to have an effect on future prices that the merchant can charge.</p>
<h2>What are some of the most influential phrases?</h2>
<p><strong>Panagiotis Ipeirotis:</strong> &#8220;Never received&#8221; is a killer phrase in terms of reputation. It reduced the price a seller can charge by an average of $7.46 in the products examined. &#8220;Wonderful experience&#8221; is one of the most positive, increasing the price a seller can charge by $5.86 for the researched products. </p>
<h2>How can very positive reviews be bad for sales?</h2>
<p><strong>Panagiotis Ipeirotis:</strong> Extremely positive reviews that contain no concrete details tend to be perceived as non-objective &mdash; written by fanboys or spammers. We observed this mainly in the context of product reviews, where superlative phrases like &#8220;Best camera!&#8221; with no further details are actually seen negatively. </p>
<h2>Can a negative review ever be good for sales?</h2>
<p><strong>Panagiotis Ipeirotis:</strong> It can when the review is overly negative or criticizes aspects of the product that are not its primary purpose &mdash; the video quality in an <a href="http://en.wikipedia.org/wiki/Single-lens_reflex_camera">SLR camera</a>, for example. Or, when customers have unreasonable expectations: &#8220;Battery life lasts only for two days of shooting.&#8221; Readers interpret these types of negative comments as &#8220;This is good enough for me,&#8221; and it decreases their uncertainty about the product.</p>
<h2>What is the effect of badly written reviews on sales?</h2>
<p><strong>Panagiotis Ipeirotis:</strong> Reviews containing spelling and grammatical errors consistently result in suboptimal outcomes, like lower sales or lower response rates. That was a fascinating but, in retrospect, expected finding. This holds true in a wide variety of settings, from reviews of electronics to hotels. It&#8217;s even the case when examining email correspondence about a decision, such as whether or not to hire a contractor. </p>
<p>We don&#8217;t know the exact reason yet, but the effect is very systematic. There are several possible explanations:</p>
<ul>
<li> Readers think that the customers who buy this product are uneducated, so they don&#8217;t buy it.</li>
<li> Reviews that are badly written are considered unreliable and therefore increase the uncertainty about the product.</li>
<li> Badly written reviews are unsuccessful attempts to spam and are a signal that even the other good reviews may not be authentic.</li>
</ul>
<h2>What&#8217;s the relationship between the product attributes discussed in reviews and the attributes that lead to sales?</h2>
<p><strong>Panagiotis Ipeirotis:</strong> We observed that the aspects of a product that drive the online discussion are not necessarily the ones that define consumer decisions to buy it. For example, &#8220;zoom&#8221; tends to be discussed a lot for small point-and-shoot cameras. However, very few people are influenced by the zoom capabilities when it comes down to deciding which camera to buy. </p>
<p><em>This interview was edited and condensed.</em></p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px;clear: both"><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-ipeirotis-interview"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/2011-strata-ca-promo.png" /></a><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-ipeirotis-interview"><strong>Strata 2012</strong></a> &mdash;  The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.</p>
<p><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-ipeirotis-interview"><strong>Save 20% on registration with the code RADAR20</strong></a></div>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://strataconf.com/jumpstart2011/public/schedule/detail/20959">&#8220;Big Data, Stupid Decisions: The Importance Of Measuring The Right Thing&#8221;</a> (Panagiotis Ipeirotis&#8217; Strata Jumpstart 2011 presentation)</li>
<li> <a href="http://radar.oreilly.com/2011/11/feedback-semantic-analysis-mechanical-turk.html">When good feedback leaves a bad impression</a></li>
<li> <a href="http://radar.oreilly.com/2011/12/visualization-amazon-book-recommendations.html">Visualization of the Week: Amazon book recommendations</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2012/01/product-reviews-hidden-language.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Big crime meets big data</title>
		<link>http://radar.oreilly.com/2011/12/marc-goodman-data-crime.html</link>
		<comments>http://radar.oreilly.com/2011/12/marc-goodman-data-crime.html#comments</comments>
		<pubDate>Mon, 19 Dec 2011 14:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[crime]]></category>
		<category><![CDATA[crimeware]]></category>
		<category><![CDATA[cyber crime]]></category>
		<category><![CDATA[Future Crimes Institute]]></category>
		<category><![CDATA[Marc Goodman]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[strata]]></category>
		<category><![CDATA[terrorism]]></category>
		<category><![CDATA[viruses]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/12/marc-goodman-data-crime.html</guid>
		<description><![CDATA[Marc Goodman, consultant and cyber crime expert, explains how criminals and terrorists can put data, automation, and scalability to effective use. ]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.futurecrimes.com/about/mg/">Marc Goodman</a> (<a href="http://twitter.com/futurecrimes">@futurecrimes</a>) is a former Los Angeles police officer who started that department&#8217;s first Internet crime unit in the mid-1990s. After two decades spent working with Interpol, the United Nations, and NATO, Goodman founded the <a href="http://www.futurecrimes.com/">Future Crimes Institute</a> to track how criminals use technology.</p>
<p>Malicious types of software, like viruses, worms, and trojans, are the main tools used to harvest personal data. Cyber criminals also use social engineering techniques, such as phishing emails populated with data gleaned from social networks, to trick people into providing further details. In the interview below, Goodman outlines some of the other ways organized criminals and terrorists are harnessing data for nefarious ends.</p>
<h2>What motivates data criminals?</h2>
<p><a href="http://www.futurecrimes.com/about/mg/"><img src="http://assets.en.oreilly.com/1/eventprovider/1/_@user_121598.jpg" border="0" alt="Marc Goodman" width="75" style="float: right; margin: 3px 0 10px 10px;" /></a><strong>Marc Goodman:</strong> Anything that would motivate someone to join a startup would motivate a criminal. They want money, shares in the business, a challenge. They don&#8217;t want a 9-to-5 environment. They also want the respect of their peers. They have an us-against-them attitude; they&#8217;re highly innovative and adaptive, and they never take the head-on approach. They always find clever and imaginative ways to go about something that a good person would never have considered.</p>
<h2>What type of personal data is most valuable to criminals?</h2>
<p><strong>Marc Goodman:</strong> The best value is a bank account takeover. A standard credit card might cost a criminal only $10, but for $700 they could buy details of a bank account with $50,000 in it, money that could be stolen in just one transaction.</p>
<p>European credit cards tend to cost more than American credit cards since Europeans are much better at guarding their data and have legislation in place, such as the European directive on data privacy, which prohibits the aggregation and long-term storage of personally identifiable information.There&#8217;s also a universal identifier for Americans &mdash; the social security number &mdash; but the same thing doesn&#8217;t exist from a pan-European perspective.</p>
<h2>How is data crime more scalable than traditional crime?</h2>
<p><strong>Marc Goodman:</strong> Data crime can be scripted and automated. If you were to take a gun or a knife and stand on a street corner, there are only so many people you can rob. You have to do the crime, run away from the scene, worry about the police, etc. You can&#8217;t walk into Wembley Stadium with a gun and say, &#8220;Everybody, put your hands up,&#8221; but you can do the equivalent from a cyber-crime perspective.</p>
<p>One of the reasons why cyber crime thrives is that it&#8217;s totally international whereas law enforcement is totally national. Now, the person attacking you can be sitting in New York or Tokyo or Botswana. The ability to conduct business without getting on a plane is an awesome advantage for international organized crime.</p>
<div style="float: left; border-top: thin gray solid; border-bottom: thin gray solid; padding: 20px; margin: 20px 2px; clear: both;"><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-marc-goodman-interview"><img style="float: left; border: none; padding-right: 10px;" src="http://blogs.oreilly.com/wp/wp-content/uploads/2011/10/2011-strata-ca-promo1.png" /></a><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-marc-goodman-interview"><strong>Strata 2012</strong></a> &mdash;  The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.</p>
<p><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-marc-goodman-interview"><strong>Save 20% on registration with the code RADAR20</strong></a></div>
<h2>How has cyber crime evolved?</h2>
<p><strong>Marc Goodman:</strong> In the 1970s, you had to be a clever hacker and create your own scripts. Now all of that stuff can be bought off the shelf. You can buy a package of <a href="http://www.theregister.co.uk/2011/05/10/zeus_crimeware_kit_leaked/">crimeware</a> and put in the email addresses or the domain that you want to attack via a nice user interface. It&#8217;s really plug-and-play criminality.</p>
<h2>You claim that the <a href="http://en.wikipedia.org/wiki/2008_Mumbai_attacks">2008 Mumbai attackers</a> used real-time data gathering from social networks and other media. How do terrorists use data?</h2>
<p><strong>Marc Goodman:</strong> Since the Internet arrived, terrorists have been advertising, doing PR, recruiting, and fundraising, all online. But this was the first time that we had seen terrorists use technology to the full extent that this group did during the incident. They had mobile phones and satellite phones. The terrorist war room they set up to monitor the media and feed back information in real time to the attackers was a really significant innovation.</p>
<p>They re-engineered the attack mid-incident to kill more people. They were constantly looking for new hostages. Organizations like the BBC and CNN were tweeting to ask people on the ground in Mumbai to contact a producer. People trapped in hotels called the TV stations. All of that information was being tracked by the terrorist war room. There was an Indian minister who was doing a live interview on the Indian Broadcast Network (IBN) while hiding in the kitchen of the ballroom of the Taj Mahal hotel. The war room picked this up and directed the attackers to that part of the hotel where they could find the minister.</p>
<h2>What can be done to combat cyber crime?</h2>
<p><strong>Marc Goodman:</strong> The terrorism problem is very different from the cyber crime problem. Most acts of terrorism are carried out in the real world whereas cybercrime offenses take place in virtual spaces.Governments are pretty good at tracking the terrorists in their own countries, and there is decent international cooperation on terrorism.</p>
<p>What is making things more difficult for governments is that, in the old days, if you tapped somebody&#8217;s home phone, you had a good picture of what was going on. Now you don&#8217;t know where to look. Are they communicating on Facebook, on Twitter, or having a meeting in World of Warcraft?</p>
<p>Law enforcement needs to develop better systems to deal with the caophony of social media usage during a terrorist attack. The public is getting involved in ways that are, frankly, unhealthy. There was a hostage situation in the U.S. a couple of months ago where a man took a hostage and was sexually assaulting her. He had her trapped in a hotel room with guns and was posting live on Facebook and Twitter. Then the public started to interact with the hostage-taker, tweeting things like, &#8220;You wouldn&#8217;t kill her. You are not brave enough to do it.&#8221; In the past, police could close off several city blocks, put up yellow crime scene tape, close the airspace over the scene, and bring in a trained negotiator. How does law enforcement intervene when there can be a completely disintermediated conversation between the criminal or terrorist and the general public?</p>
<hr />
<p><em>Marc Goodman discussed <a href="http://strataconf.com/summit2011/public/schedule/speaker/121598">the business of illegal data</a> at Strata New York 2011. His full presentation is available in the following video:</em></p>
<p><iframe width="600" height="335" src="http://www.youtube.com/embed/6ueKilyThQg" frameborder="0" allowfullscreen></iframe></p>
<p><em>This interview was edited and condensed.</em></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2011/09/crime-sourcing.html">From crowdsourcing to crime-sourcing: The rise of distributed criminality</a></li>
<li> <a href="http://radar.oreilly.com/2011/12/cloud-service-security-attack.html">Why cloud services are a tempting target for attackers<br />
</a></li>
<li> <a href="http://radar.oreilly.com/2011/02/cybersecurity-gov-hackers.html">Trend to watch: Formal relationships between governments and hackers</a></li>
<li> <a href="http://radar.oreilly.com/2010/02/cyber-warfare-dont-inflate-it.html">Cyber warfare: don&#8217;t inflate it, don&#8217;t underestimate it</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/12/marc-goodman-data-crime.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The smart grid data deluge</title>
		<link>http://radar.oreilly.com/2011/06/the-smart-grid-data-deluge.html</link>
		<comments>http://radar.oreilly.com/2011/06/the-smart-grid-data-deluge.html#comments</comments>
		<pubDate>Wed, 22 Jun 2011 13:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[@top]]></category>
		<category><![CDATA[data product]]></category>
		<category><![CDATA[energy]]></category>
		<category><![CDATA[environment]]></category>
		<category><![CDATA[sensors]]></category>
		<category><![CDATA[smart meter]]></category>
		<category><![CDATA[utilities]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/06/the-smart-grid-data-deluge.html</guid>
		<description><![CDATA[The smart grid is an information revolution for utilities, and the first line of the information the grid uses will come from smart meters. EMeter&apos;s Aaron DeYonker discusses meter use and data applications in this interview. ]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/kkoshy/1671399978/" title="Seoul by night by Koshyk, on Flickr"><img src="http://s.radar.oreilly.com/2011/06/17/0611-smartgrid.jpg" border="0" width="300" alt="Seoul by night by Koshyk, on Flickr" style="float: right;margin: 3px 0 10px 10px" /></a>The smart grid is an information revolution for utilities, and the first source of that information is smart meters. A smart meter records consumption of electricity in intervals of an hour or less and communicates that information back to the utility.</p>
<p><a href="http://www.emeter.com/">EMeter</a> provides metering data management products to utilities. I talked to Aaron DeYonker, global head of product management at eMeter, about how data will transform the way utilities do business. Our interview follows.</p>
<hr />
<h2>How much new data do utilities have to process because of smart metering?</h2>
</p>
<p><strong>Aaron DeYonker:</strong> Currently, most utilities do a standard meter read once a month. With smart meters, utilities have to process data at 15-minute intervals. This is about a 3,000-fold increase in daily data processing for a utility, and it&#8217;s just the first wave of the data deluge. The second wave will include granular data from smart appliances, electric vehicles and other metering points throughout the grid. That will exponentially increase the amount of data being generated.</p>
</p>
<h2>What processes do utilities have to change to deal with the data?</h2>
</p>
<p><strong>Aaron DeYonker:</strong> Utilities tend to be risk adverse and as a result may not have utilized technology for internal business process as much as other industries like telco or banking. Coupled with growing energy demands, utilities need to change a number of business processes beyond billing to handle the amount and type of data they now manage. Outage management, customer service, load research and wholesale electricity market transactions &mdash; where utilities buy and sell electricity to each other  &mdash;  are just a few of the impacted areas.</p>
<p>In the case of power outage management, for example, the traditional business process relied on processing customer calls to detect outage hotspots. Now, smart meters can automatically report that they have lost power. That enables more proactive dispatching of resources to fix problems. However, the data needs to be collected, validated and processed quickly to make resolution times as fast as possible. So utilities now need to develop new business processes to enable this.</p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px"><a href="https://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-smart-grid-meters"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/strata-ny-stn11rad.png" /></a><a href="https://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-smart-grid-meters"><strong>Strata Conference New York 2011</strong></a>, being held Sept. 22-23, covers the latest and best tools and technologies for data science &#8212; from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.</p>
<p><a href="https://en.oreilly.com/stratany2011/public/regwith/stn11rad?cmp=il-radar-st11-smart-grid-meters"><strong>Save 30% on registration with the code STN11RAD</strong></a></div>
</p>
<h2>Beyond billing, how can utilities put metering data to use?</h2>
</p>
<p><strong>Aaron DeYonker:</strong> The list is endless, but here are a few use cases: customer service, outage management, conservation and efficiency programs, <a href="http://en.wikipedia.org/wiki/Demand_response">demand response</a> (reducing consumption at peak demand times), demand forecasting, energy &#8220;theft&#8221; detection, &#8220;line-loss&#8221; detection, new product offerings (including new rates for electric vehicles and renewables), and peak-load reduction programs. </p>
<p>Let&#8217;s take customer service as a specific example. Many utility call centers deal with routine complaints about high bills. Customers can now get notifications that a spike in usage has been detected and the data might even be able to pinpoint the actual culprit within the home, such as an aggressively programmed air conditioner. Alerting the customer before the bill arrives will reduce call volumes and increase customer satisfaction.</p>
</p>
<h2>What actionable metrics or alerts can you derive from meter data?</h2>
</p>
<p><strong>Aaron DeYonker:</strong> Having access to more detailed data opens the door to setting more accurate metrics, both for the utility and the consumer. With the right data, utilities can better diagnose outages as close to real time as possible and allocate resources as quickly and effectively as possible to resolve problems.The meters will send these events up and the system needs to perform powerful and fast analysis to distinguish true problems from false alarms.</p>
<p>On the business side, many utilities look for ways to reduce the peak demand for energy, which typically occurs in the afternoon and evening during the summer. Programs that have cheaper night-time rates are now possible because metering data is gathered at regular intervals. Measuring the effect of these rates on customer behavior is critical in lobbying regulatory bodies to approve these programs for broader roll out.</p>
<p>In addition, consumers are empowered to regulate their own energy use. That helps reduce unnecessary energy usage and generation. In some jurisdictions, the metering data will in fact eliminate the need for new nuclear power plants based on the expected efficiencies derived from smarter pricing.</p>
</p>
<h2>How do you see utilities using metering data in the future?</h2>
</p>
<p><strong>Aaron DeYonker:</strong> At the very least, flattening out the daily demand curve will have incalculable benefits on our overall energy infrastructure as demand increases. The use of metering data enables a limitless potential for new products and services for utilities and their customers. Armed with a rich dataset on par with that of the financial services and telco industries, utility companies can model, forecast and prototype with much more power and precision. It&#8217;s impossible to envision the extent to which this data is applied to future innovations, but it will be extensive and world changing.</p>
<p><em>This interview was edited and condensed.</em></p>
<p><em>Photo: <a href="http://www.flickr.com/photos/kkoshy/1671399978/" title="Seoul by night by Koshyk, on Flickr">Seoul by night by Koshyk, on Flickr</a></em></p>
<p></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://venturebeat.com/2010/10/29/super-grid-introduction/">The super grid. Coming soon to a power outlet near you</a></li>
<li> <a href="http://radar.oreilly.com/2011/05/water-leaks-data-analysis-takadu.html">Plugging water leaks with data</a></li>
<li> <a href="http://radar.oreilly.com/2011/06/data-algorithm-icu-health.html">Algorithms are the new medical test</a></li>
<li> <a href="http://radar.oreilly.com/2011/05/sentiment-analysis-finance.html">Trading on sentiment</a></li>
<li> <a href="http://radar.oreilly.com/2011/05/machine-to-machine-m2m.html">With M2M, the machines do all the talking</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/06/the-smart-grid-data-deluge.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Dating with data</title>
		<link>http://radar.oreilly.com/2011/06/dating-data-okcupid-oktrends.html</link>
		<comments>http://radar.oreilly.com/2011/06/dating-data-okcupid-oktrends.html#comments</comments>
		<pubDate>Wed, 08 Jun 2011 14:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[@top]]></category>
		<category><![CDATA[data product]]></category>
		<category><![CDATA[dating]]></category>
		<category><![CDATA[OkCupid]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/06/dating-data-okcupid-oktrends.html</guid>
		<description><![CDATA[OkCupid relies on math, data, and revealed preferences (what you do, not what you say) to distinguish itself in the crowded dating world.  ]]></description>
				<content:encoded><![CDATA[<p><img src="http://s.radar.oreilly.com/2011/05/26/0511-okcupid-logo.png" border="0" alt="OkCupid logo" width="202" style="float: right;margin: 3px 0 10px 10px" /><a href="http://www.okcupid.com">OkCupid</a> is a free dating site with seven million users. The site&#8217;s blog, <a href="http://blog.okcupid.com/">OkTrends</a>, mines data from those users to tackle important subjects like &#8220;<a href="http://blog.okcupid.com/index.php/the-case-for-an-older-woman/">The case for an older woman</a>&#8221; and &#8220;<a href="http://blog.okcupid.com/index.php/the-real-stuff-white-people-like/">The REAL &#8216;stuff white people like&#8217;</a>.&#8221; </p>
<p>Beyond clever headlines, OkCupid also uses an unusual pedigree to separate itself from the dating site pack: The business was founded by four Harvard-educated mathematicians.</p>
<p>&#8220;It probably scared people when they first heard that four math majors were starting a dating site,&#8221; said CEO Sam Yagan during a recent interview. But the founders&#8217; backgrounds greatly influenced how they approached the problem of dating. </p>
<p>&#8220;A lot of other dating sites are based on psychology,&#8221; Yagan said. &#8220;The fundamental premise of a site like <a href="http://www.eharmony.com/">eHarmony</a> is that they know the answer. Our approach to dating isn&#8217;t that there&#8217;s some psychological theory that will be the answer to all your problems. We think that dating is a problem to be solved using data and analytics. There is no magic formula that can help everyone to find love. Instead, we bring value by building a decent-sized platform that allows people to provide information that helps us to customize a match algorithm to each person&#8217;s needs.&#8221; </p>
<p>OkCupid works by having users state basic preferences and answering questions like &#8220;Is it wrong to spank a child who&#8217;s been bad?&#8221; Users are matched based on the overlap of their answers and how important each question is to both users.</p>
<p>Yagan said data was built into the business model from the beginning. &#8220;We knew from the time we started the company that the data we were generating would have three purposes: helping us match people up, attracting advertisers since that was the core of our revenue model, and that the data would also be interesting socially.&#8221;</p>
<p>In 2007, the company hired a PR firm to publicize some of its findings, such as the fact that when gas prices rise, users narrow the search radius for matches. &#8220;We called dozens of reporters and nobody cared,&#8221; Yagan said. So OkCupid fired the PR firm and started publishing their findings on the <a href="http://blog.okcupid.com/">OkTrends blog</a>. The blog has thus far doubled traffic to the site. </p>
<p>&#8220;The blog is partly an  advice column, but instead of being written by a psychologist, the data writes itself,&#8221; Yagan said. &#8220;For example, we don&#8217;t tell you that you should or should not use a flash for your profile photo. We just tell you that <a href="http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/">if you use a flash you&#8217;ll look seven years older</a>.&#8221;</p>
<div style="float: left;border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px"><a href="https://en.oreilly.com/web2011/public/regwith/radar?cmp=il-radar-wb11-okcupid-yagan-interview"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/web2summit11-code-radar.png" /></a><a href="https://en.oreilly.com/web2011/public/regwith/radar?cmp=il-radar-wb11-okcupid-yagan-interview"><strong>Web 2.0 Summit</strong></a>, being held October 17-19 in San Francisco, will examine &#8220;The Data Frame&#8221; &mdash; focusing on the impact of data in today&#8217;s networked economy.</p>
<p><a href="https://en.oreilly.com/web2011/public/regwith/radar?cmp=il-radar-wb11-okcupid-yagan-interview"><strong>Save $300 on registration with the code RADAR</strong></a></div>
<p>I asked Yagan about the data on which OkTrends draws. &#8220;We have people&#8217;s registration data,&#8221; he said. &#8220;Then we have stated preferences; the answers that people give to the questions we ask them. We use that kind of data occasionally, but it&#8217;s not the core difference that we have. The core difference is in the category of revealed preferences. Imagine if you had a video camera in every bar and you could observe every interaction between two people and see the success rate of that interaction. We essentially have that video camera on our site.&#8221; </p>
<p>The reason revealed preferences are so important is that they track real-world behavior &mdash; what people really want rather than what they say they want. &#8220;When you get 12 messages and you only reply to three of them, you are voting with your time,&#8221; Yagan said. &#8220;Or when a guy is shorter than you, you don&#8217;t reply.&#8221; </p>
<p>Mobile adds a new revealed preferences dimension for OkCupid. &#8220;As our product gets more mobile and location-aware, we are more likely to be on that date with them,&#8221; Yagan said. &#8220;Then we can model the kinds of conversations on the site that lead to an in-person meeting.&#8221; OkCupid can currently track the five million messages sent every week on the site as well as other revealed preferences, like ratings of profiles.</p>
<p>According to Yagan, OkCupid doesn&#8217;t use  sophisticated data mining or analytics tools: &#8220;Most of it can be done by querying the database and crunching numbers in Excel. The fact that we have four math majors and a full-time statistician means that we take that number crunching very seriously.&#8221;</p>
<p></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2011/04/personal-data-utility-serendipity-expression.html">Personal data is the future, but does anybody care?</a></li>
<li> <a href="http://radar.oreilly.com/2011/05/spotrank-human-density-data.html">Want to know where to build a new store? Check your human density data</a></li>
<li> <a href="http://radar.oreilly.com/2011/05/data-scraping-infochimps.html">Scraping, cleaning, and selling big data</a></li>
<li> <a href="http://radar.oreilly.com/2010/09/a-new-twist-on-data-driven-sit.html">A new twist on &#8220;data-driven site&#8221;</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/06/dating-data-okcupid-oktrends.html/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Algorithms are the new medical tests</title>
		<link>http://radar.oreilly.com/2011/06/data-algorithm-icu-health.html</link>
		<comments>http://radar.oreilly.com/2011/06/data-algorithm-icu-health.html#comments</comments>
		<pubDate>Tue, 07 Jun 2011 14:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data product]]></category>
		<category><![CDATA[ehrs]]></category>
		<category><![CDATA[health IT]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/06/data-algorithm-icu-health.html</guid>
		<description><![CDATA[Predictive Medical Technologies says its new system can use real-time, intensive care unit monitoring data to predict cardiac arrest and other events up to 24 hours ahead of time. CEO Bryan Hughes discusses the system and the application of diagnostic data in this interview. ]]></description>
				<content:encoded><![CDATA[<p><img src="http://s.radar.oreilly.com/2011/05/31/0611-ekg.jpg" border="0" alt="ekg by krzakptak, on Flickr" width="300" style="float: right;margin: 3px 0 10px 10px" /><a href="http://www.predictive-medical.com">Predictive Medical Technologies</a> claims that it can use real-time, intensive care unit (ICU) monitoring data to predict clinical events like cardiac arrest up to 24 hours ahead of time. Effectively, the startup&#8217;s algorithms are new types of medical tests that an ICU doctor can take into consideration when deciding on a course of treatment.</p>
<p>Predictive Medical Systems is based in the University of Utah&#8217;s medical accelerator, which is attached to a hospital. The system will soon be tested on a trial basis with real patients and ICU physicians.</p>
<p>I recently talked to CEO Bryan Hughes about using data in diagnosis. Our interview follows</p>
<hr />
<h2>What kinds of data is already available from hospital electronic medical records (EMR) and patient monitoring systems? </h2>
</p>
<p><strong>Bryan Hughes:</strong> We require that a hospital be at a certain technological level, in particular that the hospital has an EMR solution that is at minimum classified as <a href="http://www.himssanalytics.org/hc_providers/emr_adoption.asp">Stage 4</a>, or a <a href="http://en.wikipedia.org/wiki/Computerized_physician_order_entry">Computerized Physician Order Entry system</a>. Only about 100 hospitals in the U.S. are at this stage right now.</p>
<p>Once a hospital has achieved this stage, we can integrate with their computer systems and extract the raw data coming from the monitors, lab reports and even nursing notes. We can then perform realtime patient data mining and data analytics.</p>
<p>Our system works behind the scenes constantly analyzing the raw patient data coming in from a variety of sources like chemistry panels, urinalysis, micro biology, respiratory and bedside monitors. We attempt to alert the doctor early of an adverse event such as cardiac arrest, or that a patient might be trending toward an arrhythmia or pneumonia.</p>
<div style="border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px"><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-icu-data"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/oscon-code-os11rad.png" /></a><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-icu-data"><strong>Health IT at OSCON 2011</strong></a> &mdash; The conjunction of open source and open data with health technology promises to improve creaking infrastructure and give greater control and engagement for patients. These topics will be explored in the <a href="http://www.oscon.com/oscon2011/public/schedule/topic/Healthcare?cmp=il-radar-os11-icu-data">healthcare track</a> at OSCON (July 25-29 in Portland, Ore.)</p>
<p><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-icu-data"><strong>Save 20% on registration with the code OS11RAD</strong></a></div>
</p>
<h2>How does the system integrate into an ICU doctor&#8217;s existing routine?</h2>
</p>
<p><strong>Bryan Hughes:</strong> Depending on the technological development of a hospital, doctors either do their rounds in the ICU using a piece of paper or using a bedside computer terminal.  Older systems might employ a COW (Computer On Wheels).</p>
<p>For hospitals that are still paper based, they have to first get to the EMR stage.<br />
It is surprising that health care, the largest and quintessential information-based industry, has failed to harness modern information exchange for so long. The oral tradition and handwritten manuscripts remain prevalent throughout most of the sector.</p>
<p>For hospitals that have an EMR, there are still several fundamental problems. The single most daunting problem facing modern doctors is the overwhelming amount of data. Unfortunately, especially with the growing adoption of electronic medical records, this information is disparate and not immediately available. The ability for a clinician to practice medicine is rooted in the ability to make sound decisions on reliable information.</p>
<p>Disparate information makes it hard to connect the dots. Massive amounts of disparate information turns the dots into a confusing sea of blobs. The dots must be connected in a manner that allows the doctor to make immediate and intelligent decisions.</p>
<p>We look at the current trends and progressions of disease states in the now, and then look at what may be happening in the next 24 hours. We then push this information to a mobile device such as an iPad allowing the doctor to see the clinically relevant dots, allowing them to make better decisions in a timely manner.</p>
<p>Eventually we hope to expand to the entire hospital.  But for now, the ICU is a big enough problem and a great starting point.</p>
</p>
<h2>How do you use data to predict outcomes like cardiac arrest?</h2>
<p>&lt;p.</p>
<p><strong>Bryan Hughes:</strong> We have two first-generation models: cardiac arrest and respiratory failure. We plan to apply our novel techniques to modeling sepsis, renal failure and re-intubation risk.</p>
<p>Without giving away too much of our secret sauce, we use non-hypothesis machine learning techniques, which have proven very promising so far. This approach allows us to eliminate any human &#8220;expert&#8221; bias from the models.  The key then is to ensure that the data we use for development and training is clean. It is only now that medical data is in electronic and structured form that this is becoming readily available.</p>
</p>
<h2>What kinds of data mining techniques do you use in the product? </h2>
<p><strong>Bryan Hughes:</strong> We use a variety of techniques. Again, without giving too much away, our approach is to use transparent algorithms rather than a black box approach. We have a patent strategy that allows us to effectively place a white fence around our technology while allowing the academic and medical community to review our results.</p>
</p>
<h2>How do you judge the accuracy of the algorithms? </h2>
</p>
<p><strong>Bryan Hughes:</strong> To date, our results have been proven using retrospective models (historical ICU monitoring data and outcomes). Our next step is to deploy our technology into a validation trial &mdash; a validation trial produces evidence that a test or treatment produces a clinical benefit. That trial is about to start at the University of Utah Medical Center in Salt Lake City.</p>
<p>Once the integration is completed in the next several weeks, we will be running a <a href="http://www.mondofacto.com/facts/dictionary?prospective,+randomised,+double-blind+clinical+trial">double-blind, prospective study</a> with patient data. While this is only a validation trial, we are following the <a href="http://www.fda.gov/">FDA</a> guidance. Once the trial is up and running, we plan on expanding the validation trial to include several more hospitals.  It will be at least 12 months before we start any formal FDA trial.</p>
</p>
<h2>How is the system updated over time? </h2>
</p>
<p><strong>Bryan Hughes;</strong> We have developed a unique architecture that allows the system to reduce the experiment to validation cycle to 8 to 10 months. Typically in the medical community, a hypothesis is developed, a model is built and then tested and if valid, a paper is published for peer review. Once the model is accepted, it can have a life span of several years of adoption and application, which is bad because as we know, information and knowledge changes as we learn and understand more.  Models need to be consistently re-evaluated and re-examined.</p>
</p>
<h2>Are any similar systems available? </h2>
</p>
<p><strong>Bryan Hughes:</strong> None in the ICU, or even dealing with patient care, that we have found to date. In other industries, predictive analysis and modeling are pretty common place. Even your spam filter employs many of the techniques that the most sophisticated risk analysis system might use. </p>
<p><em>Photo: <a href="http://www.flickr.com/photos/krzakptak/136347649/" title="ekg by krzakptak, on Flickr">ekg by krzakptak, on Flickr</a></em></p>
<p></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2011/02/algorithm-healthcare-challenge.html">A new challenge looks for a smarter algorithm to improve healthcare</a></li>
<li> <a href="http://radar.oreilly.com/2011/05/3-ways-internet-shapes-healthcare-pew.html">3 ways the Internet is shaping healthcare</a></li>
<li> <a href="http://radar.oreilly.com/2011/02/watson-jeopardy-money-health.html">Watson&#8217;s marketable skills</a></li>
<li> <a href="http://radar.oreilly.com/2010/11/better-mobile-healthcare-decis.html">Open health data: Spurring better decisions and new businesses</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/06/data-algorithm-icu-health.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>With M2M, the machines do all the talking</title>
		<link>http://radar.oreilly.com/2011/05/machine-to-machine-m2m.html</link>
		<comments>http://radar.oreilly.com/2011/05/machine-to-machine-m2m.html#comments</comments>
		<pubDate>Fri, 20 May 2011 13:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[automotive]]></category>
		<category><![CDATA[healthcare]]></category>
		<category><![CDATA[Internet of Things]]></category>
		<category><![CDATA[M2M]]></category>
		<category><![CDATA[smartgrid]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/05/machine-to-machine-m2m.html</guid>
		<description><![CDATA[In machine-to-machine communications, devices and sensors connect with each other or a central server rather than with human beings. Two M2M experts discuss M2M&apos;s applications in this interview. ]]></description>
				<content:encoded><![CDATA[<p><img src="http://s.radar.oreilly.com/2011/05/16/0511-m2m.png" width="250" border="0" alt="M2M screenshot" style="float: right;margin: 3px 0 10px 10px" />The shift from transporting voice to delivering data has transformed the business of mobile carriers, but there&#8217;s yet another upheaval on the horizon: <a href="http://en.wikipedia.org/wiki/Machine-to-Machine">machine to machine communications (M2M)</a>.</p>
<p>In M2M, devices and sensors communicate with each other or a central server rather than with human beings. These devices often use an embedded SIM card for communication over the mobile network. Applications include automotive, smartgrid, healthcare and environmental usages.</p>
<p>M2M traffic differs from human-generated voice and data traffic. Mobile carriers are adapting by creating entirely new companies for M2M, such as <a href="http://www.telenorconnexion.com/">Telenor&#8217;s M2M carrier</a> Telenor Connexion, and m2o city, <a href="http://venturebeat.com/2011/03/28/orange-water-metering/">Orange&#8217;s joint venture with water giant Veolia</a>. I talked to Göran Brandt, head of business development at Telenor Connexion and Rodolphe Fruges, VP of M2M at Orange Business Services about the future of mobile and M2M. </p>
</p>
<h2>Why did Telenor start Telenor Connexion?</h2>
</p>
<p><strong>Göran Brandt</strong>: Telenor Connexion was founded in 2008. We knew from our experience with running business-critical applications on the normal mobile infrastructure that it was not good enough. A system originally built to serve voice services, mobile office applications, etc. is not ideal for M2M. This could lead to disturbances or downtime due to normal mobile service windows. For example at night, voice customers are expected to be sleeping. </p>
</p>
<h2>Why did Orange launch a mobile service operator specifically for water metering data?</h2>
</p>
<p><strong>Rodolphe Fruges</strong>: Smart metering for utilities &mdash; water, electricity and gas &mdash; is a relatively new market where we see key advances in M2M taking place. To address this market, Orange has joined forces with Veolia Eau, a  market leader in the water industry, to create m2o city, a joint venture dedicated to smart metering.</p>
<p>To be clear, m2o city is not a &#8220;mobile operator.&#8221; That would require a GSM<br />
license. Rather, it is a &#8220;service operator&#8221; that provides the low-energy radio<br />
network that carries water metering data on behalf of local water distribution<br />
companies.</p>
<div style="border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px"><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-m2m"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/oscon-data-code-os11rad.png" /></a><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-m2m"><strong>OSCON Data 2011</strong></a>, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with <a href="http://www.oscon.com/oscon2011?cmp=il-radar-os11-m2m">OSCON</a>.)</p>
<p><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-m2m"><strong>Save 20% on registration with the code OS11RAD</strong></a></div>
</p>
<h2>How is  the data carried in M2M applications different from human-generated data?</h2>
</p>
<p><strong>Göran Brandt</strong>: In M2M applications the amount of data sent and received is normally small. A meter reading equals only a few hundred bytes of data. M2M devices can cause problems if working incorrectly. If hundreds of thousands or even millions of electricity meters act at exactly the same time (they normally have a very precise built-in clock), that would result in network congestion.</p>
<p><strong>Rodolphe Fruges:</strong> The difference is not the data itself but where the data originates, in this case self-contained mobile devices. These devices are generating a huge amount of data with more frequency. With m2o city, utility companies are dealing with 700 times more data than before. This is why service providers like Orange, who have the expertise and technical infrastructure to accommodate these data loads, are vital for these companies. </p>
<p>The data also varies between applications. For water metering, we are typically dealing with a very small data set, a few times a day, at regular hours. For security applications, the device can be silent for months before sending a large  data payload, in this case video. </p>
</p>
<h2>What network management strategies, technologies or processes does M2M rely on?</h2>
</p>
<p><strong>Göran Brandt</strong>: For a normal mobile carrier the customer interface is usually the customer help desk. In M2M you need to let your customers into your technical systems, so they can, in real time, see the status of their SIM card population and answer questions like &#8220;What country is a specific SIM card in?&#8221; and &#8220;Is the SIM card connected to a mobile data network or not?&#8221; </p>
<p><strong>Rodolphe Fruges:</strong> Creating processes and mechanisms are really essential for smart metering. You have to manage millions of devices with very specialized SIM cards across varying environmental conditions. A phone user can easily call a help desk to report a problem with his or her phone, but the device itself cannot make this call.<br />
The challenge on our end is to create automatic mechanisms that can validate<br />
whether a device is working or identify the source of any potential problems.</p>
<p>Certain M2M applications, such as streaming security videos, generate a high volume of data comparable to the data streaming occurring on mobile phones, but you can also find M2M alert applications running SMS data levels. What changes is the number of devices that are being managed rather than the volume of data.</p>
<p>A key issue with smart metering is meters situated in hard-to-reach areas, such as basements where it may be rough to get a strong mobile signal. The<br />
solution is network intermarries, which is like a meshed radio network that grabs the<br />
data from a meter and sends it to a data concentrator. This is exactly the<br />
type of network technology you will see deployed with m2o city. It&#8217;s also applicable in other M2M scenarios, like a connected automobile that is roaming across networks. We constantly adapt to the least cost network when roaming or alert our customer when an event, like a vehicle that might be stolen crossing a border, are triggered on the network.</p>
</p>
<h2>What are the biggest current M2M applications? What do you see developing in the near future?</h2>
</p>
<p><strong>Göran Brandt:</strong> Automotive (cars and trucks), energy (smart metering) and security (burglar alarms).The automotive industry is a sector where Telenor Connexion has extensive experience. We are working in close collaboration with both car manufacturers and telematics service providers to enable cost-efficient and reliable connectivity solutions to vehicles around the world. </p>
<p>There are massive smart metering deployments currently being planned. The energy sector is moving toward renewable energy sources and the implementation of smart grids. Intelligent meters are going to form the foundation of tomorrow&#8217;s smart grid infrastructure.</p>
<p>Healthcare monitoring via wireless networks is an emerging application area with huge potential. Tens of millions of patients in Europe alone could potentially benefit from some form of home healthcare monitoring solution, if it had been available. Examples of conditions suitable for remote monitoring are cardiac arrhythmia, hypertension, sleep apnea and diabetes. </p>
</p>
<h2>Are there obstacles preventing M2M applications from become mainstream?</h2>
</p>
<p><strong>Göran Brandt:</strong> Most M2M customers work in a multi-country or even global scenario (basically all countries where they sell their products), and they expect a single solution covering multiple networks in multiple countries. It&#8217;s a challenge to provide flat pricing in multi-country roaming situations. It&#8217;s equally challenging to offer M2M customers Service Level Agreements stating exactly what uptime and availability to expect, including roaming networks.</p>
<p><strong>Rodolphe Fruges:</strong> The M2M applications are enormous, but it has led to some fragmentation in the market. Verticals such as the automotive industry have been hampered by the number of competitive players trying to outdo one another. A lack of<br />
standardization has also been a deterrent impeding newer and more reliable<br />
M2M solutions. </p>
<p><em>Photo: <a href="http://www.telenorconnexion.com/">Screenshot from Telenor Connexion website</a></em></p>
<p></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2011/05/water-leaks-data-analysis-takadu.html">Plugging water leaks with data</a></li>
<li> <a href="http://radar.oreilly.com/2011/04/machine-learning-alasdair-allan.html">The quiet rise of machine learning</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/05/machine-to-machine-m2m.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Trading on sentiment</title>
		<link>http://radar.oreilly.com/2011/05/sentiment-analysis-finance.html</link>
		<comments>http://radar.oreilly.com/2011/05/sentiment-analysis-finance.html#comments</comments>
		<pubDate>Wed, 04 May 2011 14:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[finance]]></category>
		<category><![CDATA[semantic analysis]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[text analysis]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/05/sentiment-analysis-finance.html</guid>
		<description><![CDATA[Sorting through thousands of news stories and categorizing information based on mood and tone creates useful data points for financial systems. ]]></description>
				<content:encoded><![CDATA[<p><img alt="Numbers on board" src="http://s.radar.oreilly.com/2011/05/03/numbers-on-board.jpg" width="300" style="float: right;margin: 3px 0 10px 10px" />Computers don&#8217;t get emotionally invested in financial trades, but they do take feelings seriously. </p>
<p>Case in point: The financial trading dashboard managed by<br />
Thomson Reuters uses sentiment analysis data from <a href="http://www.lexalytics.com/">Lexalytics</a> to track news on<br />
20,000 stocks and thousands of commodities. The Lexalytics system<br />
parses text from multiple sources, looking for keywords, tone,<br />
relevance and freshness. The resulting textual analysis (the meaning of the text) and sentiment analysis (the emotions in the text) is then<br />
incorporated into widely used algorithmic trading systems. </p>
<p>Mark Thompson, CEO of <a href="http://www.mknly.com/">McKinley Software</a> (the parent company of Lexalytics), told me more about this emotion-to-data conversion. &#8220;Our financial engine is something we developed over an 8-year period, and the main partner for that is <a href="http://thomsonreuters.com/">Thomson Reuters</a>,&#8221; Thompson said. &#8220;The Thomson Reuters news passes through our black box and we kick out scores based on 80 different variables for all of the articles.&#8221;</p>
<p>Algorithmic trading is automated trading where trading software takes various inputs, or &#8220;trading signals,&#8221; and uses them to decide what trades to make. Trades are executed in a matter of milliseconds  and there is no human intervention. In 2009, algorithmic trading <a href="http://advancedtrading.com/algorithms/showArticle.jhtml?articleID=218401501">accounted for more than 25 percent of all shares traded on the buy side</a>. No human being can read the latest financial news fast enough to contribute to those buy or sell decisions. That&#8217;s where sentiment analysis comes in. </p>
<div style="border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px"><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-trading-sentiment"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/oscon-data-code-os11rad.png" /></a><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-trading-sentiment"><strong>OSCON Data 2011</strong></a>, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with <a href="http://www.oscon.com/oscon2011">OSCON</a>.)</p>
<p><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-trading-sentiment"><strong>Save 20% on registration with the code OS11RAD</strong></a></div>
<p>&#8220;By scoring the news, within milliseconds we get a very accurate view of what&#8217;s being said about a particular stock or sector,&#8221; Thompson said. &#8220;Thomson Reuters sells that output to trading houses who then plug this data into their algorithmic trading<br />
models. We have found that we can predict stock market movements. We provide an extra layer of richness that trading staff haven&#8217;t been able to get their hands on. Otherwise, you are just doing very two-dimensional <a href="http://en.wikipedia.org/wiki/Quantitative_analyst">quant</a><br />
processing.&#8221;</p>
<p><a href="http://www.linkedin.com/pub/rochester-cahan/15/2a1/422">Rochester Cahan</a>, VP of Global Equity Quantitative Strategy at <a href="http://www.db.com/index_e.htm">Deutsche Bank</a>, has been experimenting with the Thomson Reuters system. Cahan told me that he has seen significant improvements in trading performance when the text analysis and sentiment scores are used as trading inputs. In addition, the scores are uncorrelated with existing trading signals &mdash; in other words, they provide new information to the trading system.</p>
<p>The most positive sentiment levels (e.g. Apple releases the iPad to universal acclaim) are not necessarily the most useful for trading. The stock price reacts very quickly so it&#8217;s difficult to take advantage of the information. However, Cahan said stocks with moderate positivity tend to be overlooked by the market and can make for good buys.</p>
<p>I asked Thompson about the limitations of the sentiment analysis technology. He explained that even human beings don&#8217;t agree on the sentiment of an article more than about 85% of the time. &#8220;The problem with our kind of engine is trying to get<br />
above 85% accuracy,&#8221; Thompson said. &#8220;Beyond that level, you get a diminishing return and you need more human intervention. This leaves the human analyst to<br />
pass different types of judgements.&#8221; </p>
<p>The competitive edge may be lost if all trading systems use sentiment analysis, but Thompson thinks there is some distance to go before we get to that point. &#8220;Everyone has a slightly different way of composing the model and using the news, and there are always advances in the technology,&#8221; he said. &#8220;But there will come a point when sentiment becomes an ordinary part of the trading mix.&#8221;</p>
<p><em>Photo: <a href="http://www.flickr.com/photos/obknoxious/2982961997/" title="ABOVE by Lyfetime, on Flickr">ABOVE by Lyfetime, on Flickr</a></em></p>
<p></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2011/03/sentiment-analysis-context.html">With sentiment analysis, context always matters</a></li>
<li> <a href="http://radar.oreilly.com/2011/03/social-data-tools-application.html">Social data is an oracle waiting for a question</a></li>
<li> <a href="http://www.youtube.com/watch?v=40o86plOwc8">Video: Matthew Russell on social data tools and the promise of the semantic web</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/05/sentiment-analysis-finance.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plugging water leaks with data</title>
		<link>http://radar.oreilly.com/2011/05/water-leaks-data-analysis-takadu.html</link>
		<comments>http://radar.oreilly.com/2011/05/water-leaks-data-analysis-takadu.html#comments</comments>
		<pubDate>Mon, 02 May 2011 14:00:00 +0000</pubDate>
		<dc:creator>Ciara Byrne</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[@home]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data product]]></category>
		<category><![CDATA[Software as a Service]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2011/05/water-leaks-data-analysis-takadu.html</guid>
		<description><![CDATA[Water companies need data analysis tools to find and repair costly leaks. Enter Israeli startup TaKaDu, which offers water infrastructure monitoring as a service.  ]]></description>
				<content:encoded><![CDATA[<p>Cities in the developed world <a href="https://portal.luxresearchinc.com/research/document_excerpt/7734">lose between 10-30%</a> of their drinking water through leaks. Water companies call this &#8220;non-revenue water&#8221; because they treat the water but cannot charge for it since it doesn&#8217;t reach the user. </p>
<p>What water companies need is an efficient system that can collect and parse monitoring data so leaks can be found and repaired.</p>
<p>Israeli startup <a href="http://www.takadu.com">TaKaDu</a> aims to fill that niche with its water infrastructure monitoring service. UK-based <a href="http://www.thameswater.co.uk/">Thames Water</a>, for example, uses TaKaDu to detect leaks up to 9 days earlier than with its previous system.</p>
<p>In the following interview, TaKaDu&#8217;s VP of Marketing Guy Horowitz talks about how monitoring systems use data to plug these costly leaks.</p>
</p>
<h2>How much revenue is lost to leaks?</h2>
</p>
<p><strong>Guy Horowitz:</strong> There are direct losses and indirect losses. Direct losses are the cost of water and energy. In many countries, the water that is lost &mdash; desalinated water for example &mdash; is very expensive to produce. Even in countries with abundant water, water has to be pumped, moved and stored, which contributes to the cost of water. A city such as London loses 30% of its water; over 600 million cubic meters. A cubic meter can cost anywhere from a $0.20 to a few dollars, so we&#8217;re talking about hundreds of thousands of dollars, if not millions, per day. Add to that the energy required to replace the water lost, and you get a significant pain point.</p>
<p>Indirect losses are often higher than direct losses. Road and property damage, traffic interruptions, paying detection crews and repair crews, and in many cases regulatory fines, amount to hundreds of millions per year in large cities. Detection alone is a mega-expenditure.</p>
<p>Additional loss of revenue is associated with sub-optimized maintenance, like fixing the wrong infrastructure or replacing perfectly good pipes because of age and material, though they are not faulty at all.</p>
<div style="border-top: thin gray solid;border-bottom: thin gray solid;padding: 20px;margin: 20px 2px"><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-takadu"><img style="float: left;border: none;padding-right: 10px" src="http://s.radar.oreilly.com/oscon-data-code-os11rad.png" /></a><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-takadu"><strong>OSCON Data 2011</strong></a>, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with <a href="http://www.oscon.com/oscon2011">OSCON</a>.)</p>
<p><a href="https://en.oreilly.com/oscon2011/public/regwith/os11rad?cmp=il-radar-os11-takadu"><strong>Save 20% on registration with the code OS11RAD</strong></a></div>
</p>
<h2>What types of data are available in a water network?</h2>
</p>
<p><strong>Guy Horowitz:</strong> TaKaDu does not add any new sensors, but uses existing sensors and data. Data from flow and pressure meters, Geographical information system (GIS) data, maintenance records, access control records, quality sensors, and many other types are available.</p>
<p>Water networks can be divided into smaller areas, dubbed district metered areas (DMAs). This approach can be helpful in monitoring water loss, but it is costly to implement and, until systems like TaKaDu, required heavy human interpretation of data. TaKaDu can take a DMA-ed network and automate most of the detection without adding new meters or sensors. Since we detect deviations from normal behavior, we typically ask for a year of history to account for all benign seasonal patterns and holiday exceptions.</p>
</p>
<h2>What are the main techniques you use to analyze water data?</h2>
</p>
<p><strong>Guy Horowitz:</strong> The full explanation would be lengthy, but one example is cross-site correlation. Your neighborhood and some other neighborhood maintain a consumption relationship that holds true year-long. Why? They may have a similar demographic make-up or similar mix of residential and industrial customers. It doesn&#8217;t really matter as long as they demonstrate a similar behavior across time. Now assume your neighborhood&#8217;s consumption goes up by 10% while the &#8220;similar&#8221; neighborhood does not. That may indicate a problem, while if both increase it may indicate a warm day, or some other benign explanation. Other algorithms kick in to check all possible explanations, and if nothing explains the change, it is declared to be a problem and is classified according to its type.</p>
</p>
<h2>Do you detect other kinds of problems apart from leaks?</h2>
</p>
<p><strong>Guy Horowitz:</strong> Leaks and bursts account for only about a third of the types of problems detected by TaKaDu. A majority of alerts are on inefficiencies and faults in the network setup, operation or transmission, including faulty equipment and incorrect metering. The system also gives alerts on water quality issues, energy inefficiencies and other types of faults.</p>
</p>
<h2>What&#8217;s next for data analytics in water infrastructure management?</h2>
</p>
<p><strong>Guy Horowitz:</strong> We see very high interest from across the industry in smarter data-driven solutions for better management of water networks. I expect larger players to make significant moves in product development and collaboration with innovative smart-water players. The challenges posed by aging infrastructure and the growing investment gap in the water sector means that we need the best brains to start thinking about water problems rather than developing new mobile apps. The coming decade will be the decade of smarter water networks.</p>
<p><em>This interview was edited and condensed.</em></p>
<p></p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2010/10/strata-week-building-data-star.html">Building data startups</a></li>
<li> <a href="http://radar.oreilly.com/2011/04/renewable-energy-data-services.html">Interest in renewable energy could benefit data services</a></li>
<li> <a href="http://radar.oreilly.com/2011/03/ecology-data-markets-byproducts.html">Industrial ecology and big data</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2011/05/water-leaks-data-analysis-takadu.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
