<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>O&#039;Reilly Radar &#187; Gavin Starks</title>
	<atom:link href="http://radar.oreilly.com/gavin/feed" rel="self" type="application/rss+xml" />
	<link>http://radar.oreilly.com</link>
	<description>Insight, analysis, and research about emerging technologies</description>
	<lastBuildDate>Wed, 19 Jun 2013 10:00:39 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Data is not binary</title>
		<link>http://radar.oreilly.com/2010/06/data-is-not-binary.html</link>
		<comments>http://radar.oreilly.com/2010/06/data-is-not-binary.html#comments</comments>
		<pubDate>Wed, 30 Jun 2010 13:00:00 +0000</pubDate>
		<dc:creator>Gavin Starks</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[climate change]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[energy]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2010/06/data-is-not-binary.html</guid>
		<description><![CDATA[Open data isn&apos;t just about re-broadcasting data, but combining it, re-using it and building upon it. It&apos;s about creating new uses, creating new markets and building credibility into the data as it flows. ]]></description>
				<content:encoded><![CDATA[<p><em>Guest blogger Gavin Starks is founder and CEO of <a href="http://www.amee.com/">AMEE</a>, a neutral aggregation platform designed to measure and track all the energy data in the world..</em></p>
<p>The World Bank has stated that &#8220;<a href="http://blogs.worldbank.org/dmblog/open-vs-public-data-the-big-difference">data in document format is effectively useless</a>&#8220;.
</p>
<p>However, &#8220;open data&#8221; is only the beginning of a journey. Simply applying the rules of open source as applied to software may help us take the first steps, but there are new categories of challenges to face.</p>
</p>
<h2>Data needs to be computable (ie. acted upon in context)</h2>
</p>
<p>&#8220;Data&#8221; is a much broader term than &#8220;code.&#8221; The term embodies a range of dimensions: there are more than just the numbers at play, especially with scientific data.</p>
<ul>
<li> How was the data collected?</li>
<li> How should the data be used?</li>
<li> Are the models for processing the data valid?</li>
<li> What assumptions exist, in words and equations?</li>
<li> What is the significance of the assumptions?</li>
</ul>
<p>In an age when peer review is an anachronism, we are searching for new solutions for &#8220;scientific content management&#8221;. When <a href="http://en.wikipedia.org/wiki/Pascal%27s_Wager">Pascal&#8217;s Wager</a> is evoked, it is equally important to remember <a href="http://en.wikipedia.org/wiki/Godel#The_Incompleteness_Theorem">Godel&#8217;s incompleteness theorems</a> (in complex enough systems, logic can be used to prove anything, including untrue statements).</p>
<p>Only eight percent of members of the Scientific Research Society agreed that &#8220;peer review works well as it is&#8221; (Chubin and Hackett, 1990; p.192). Peer review has also been claimed to be &#8220;a non-validated charade whose processes generate results little better than does chance.&#8221; But in the same context: &#8220;Peer review is central to the organization of modern science &#8230; why not apply scientific [and engineering] methods to the peer review process&#8221; (Horrobin, 2001)&#8221;. The absence of URLs on those two pieces of research are indicative of one of the problems we are trying to solve.</p>
<p>Peer review remains today in its current form because of history, but in a niche because technology has opened up usage to a mass audience.</p>
<p><span id="more-40158"></span>
</p>
<h2>We must build tools that enable credible engagement</h2>
</p>
<p>To illustrate our story: we are engaged with the very pressing and complex issue of climate change. At <a href="http://www.amee.org/">AMEE</a> we codify international, government, and proprietary data, models and methodologies that represent, at the most fundamental level, the algorithms that enable the energy, carbon and environmental cost of consumption and activities to be calculated. AMEE doesn&#8217;t just store and re-broadcast data, it performs the calculations based on inputs to the models.</p>
<p>One of our challenges is getting at the raw data in a useful, repeatable, and traceable form. As a result of this, one of the core services we offer to data and standards managers are tools that enable this.  </p>
<p>Releasing raw data is vital. There can be no excuse not to. Releasing source code is optional. It&#8217;s truly great for open source review, but it&#8217;s also dangerous if everyone just re-runs the same code with the same baked-in implicit and explicit assumptions and errors. </p>
<p>This is where data and code deviate substantially. The logic cascade for the interpretation of data is not unary (there is no single interpretation), it is based on assumptions that may vary and are subject to many quantitative and qualitative inputs: the interpretation of the data is not even binary. </p>
<p>We believe it&#8217;s much better to publish the following five components to provide transparent and auditable disclosure:</p>
<ol>
<li> The raw data</li>
<li> The circumstances of its collection</li>
<li> The method and assumptions used to process the data (in words and equations)</li>
<li> The results of the processing</li>
<li> The known limitations on the method and significance of the assumptions</li>
</ol>
<p>The processing code should be written from scratch as many times as possible to reduce the chance that it affected the results in any way.</p>
<p>Once &#8220;published,&#8221; the challenge is the how to build out a credible, and usable, set of services that encourage correct usage.</p>
</p>
<h2>Building the solution stack</h2>
</p>
<p>At AMEE we have developed a six-tier solution to try and address some of these issues. Specifically, we address the gap between content creators/managers (e.g. standards bodies) and content users (e.g. software apps, consultants, auditors), with a solution that is both human and machine-readable.</p>
<p><strong>1. Aggregation</strong> &#8212; We aggregate the raw data, and track and log the sources. We have a standards <a href="http://www.amee.com/2010/01/28/amee-carbon-standards-spider/">spider</a> that checks for changes, not unlike a search engine spider.</p>
<p><strong>2. Content Enhancement</strong> &#8212; In the process of aggregation, we document the data, and embed provenance, linking back to the source. We also add <a href="http://explorer.amee.com/Authority">authority</a>, a measure of the reliability and credibility of the source. We&#8217;re beginning to add other taxonomies and semantic links that enable the data to be joined, and are building tools for engagement with the platform to stimulate discussion.</p>
<p><strong>3. Discoverability</strong> &#8212; <a href="http://explorer.amee.com">AMEE Explorer</a> is the human-readable version of the data, and the only search engine on carbon calculation models (N.B.: we are focused on the industrial and human impacts at the moment, not modeling the climate itself). </p>
<p><strong>4. Repeatable Quality</strong> &#8212; We have a quality-control process around the underlying data that is similar to a <a href="http://en.wikipedia.org/wiki/Six_Sigma">Six Sigma</a> process. Our systems self-test the data every 30 minutes, and human checks are carried out at random intervals to ensure systemic errors have not been introduced. Our target accuracy metric is 100 percent, not five-nines.</p>
<p><strong>5. Computable Engine</strong> &#8212; We believe we are taking the notion of a master database service to an entirely new level by ensuring that not only the data is robust, but AMEE performs the actual calculations. AMEE retains an audit history behind both the inputs and the calculations themselves.</p>
<p><strong>6. Interoperability and auditability</strong> &#8212; The AMEE API is the machine-readable version of the data (in fact all of the content including meta data and documentation), which enables the calculations to be done. AMEE also stores the audit-history of both the inputs and the calculation mechanics. For example: PUT a (flight in an F-15 from London to New York at combat thrust), and GET the kgCO2 for that journey, or PUT (1000kWh reported by my Whirlpool fridge for this month, in Washington, using my preferred energy supplier and my solar panels) and GET the kgCO2.</p>
</p>
<h2>Challenges</h2>
</p>
<p>AMEE is positioned right at the junction between cloud, code, API, content, data, and the usage of the data, and as carbon becomes priced, we believe the consequences of getting it wrong are extremely high.</p>
<p>From an &#8220;open&#8221; standpoint, one of the big challenges we face includes defining where the boundaries of &#8220;open&#8221; lie. Our value, of course, is in the ongoing maintenance and reliability of the system, and connecting the data.</p>
<p>Commercially, we are treading very carefully through the platform and use-case stack (core platform, API, data, algorithms, code, structure, etc), and increasing transparency at the most relevant points for the end-user (who needs to feel confident about their own inputs and outputs). It&#8217;s a complex stack, and no open source or creative commons licenses wholly cover the kinds of issues we face.</p>
<p>Our field, carbon footprinting, is what we call a &#8220;non-trivial&#8221; example of where open data meets the markets: billions of dollars are flowing through or around these data on the carbon markets. For example, thousands of businesses in the UK have to start reporting their carbon footprint to the government this year, and paying for it next year. Very, very few people understand how to use this data, how it all joins together, where the trap doors are, and why it&#8217;s important to build an industry-stack to solve the problem.</p>
<p>If we don&#8217;t build a credible industry stack, from the ground up, the outcome could be no industry at all (or a tiny one), and that has dire consequences not only for the vendors and businesses in the space (such as SAP, SAS, CA, Microsoft, Google, and others), but also removes our ability to accelerate solving the underlying issue of carbon and climate change itself. Root cause of this credibility-gap has been lack of transparency, and no one has comprehensively joined the dots to see what is real, and what it not.</p>
<p>We also believe this kind of approach has huge value in many areas beyond the ones AMEE is addressing.</p>
<p>Open data isn&#8217;t just about re-broadcasting data, but combining it, re-using it and building upon it. It&#8217;s about creating new uses, creating new markets and building credibility into the data as it flows.</p>
<p><strong>Related:</strong></p>
<ul>
<li> <a href="http://radar.oreilly.com/2010/06/what-is-data-science.html">What is data science?</a></li>
<li> <a href="http://answers.oreilly.com/topic/1571-a-data-science-cheat-sheet/">A data science cheat sheet</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2010/06/data-is-not-binary.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>A Climate of Polarization</title>
		<link>http://radar.oreilly.com/2009/01/climate-polarization.html</link>
		<comments>http://radar.oreilly.com/2009/01/climate-polarization.html#comments</comments>
		<pubDate>Wed, 28 Jan 2009 20:56:48 +0000</pubDate>
		<dc:creator>Gavin Starks</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[climate change]]></category>
		<category><![CDATA[communication]]></category>
		<category><![CDATA[emerging tech]]></category>
		<category><![CDATA[media]]></category>

		<guid isPermaLink="false">http://blogs.oreilly.com/radar/2009/01/climate-polarization.html</guid>
		<description><![CDATA[We are entering an new era of seismic change in policy, business, society, technology, finance and our environment, on a scale and speed substantially greater than previous revolutions. More than ever, we need to create space for learning, communication and understanding. ]]></description>
				<content:encoded><![CDATA[<p><em>Guest blogger Gavin Starks is founder and CEO of<a href="http://www.amee.com"> AMEE</a>, a neutral aggregation platform designed to measure and track all the energy data in the world. Gavin has a background in Astrophysics and over 15 years Internet development experience.</em></p>
<p>We&#8217;re all aware of the emotive language used to polarize the climate change debate.
</p>
<p>
There are, however, deeper patterns which are repeated across science as it interfaces with politics and media. These patterns have always bothered me, but they&#8217;ve never been as &#8220;important&#8221; as now.
</p>
<p>
We are entering an new era of seismic change in policy, business, society, technology, finance and our environment, on a scale and speed substantially greater than previous revolutions. The sheer complexity of these interweaving systems is staggering.
</p>
<p>
Much of this change is being driven by &#8220;climate science&#8221;, and in the communications maelstrom there is a real risk that we further alienate &#8220;science&#8221; across the board.
</p>
<p>
We need more scientists with good media training (and presenting capability) to change the way that all sciences are represented and perceived. We need more journalists with deeper science training &#8211; and the time and space to actually communicate across all media. We need to present uncertainty clearly, confidently and in a way that doesn&#8217;t impede our decision-making.
</p>
<p>
On the climate issue, there are some impossible levers to contend with;
</p>
<ol>
<li> Introducing any doubt into the climate debate stops any action that might combat our human impact.</li>
<li>Introducing &#8220;certainty&#8221; undermines our scientific method and its philosophy.</li>
</ol>
<p>
When represented in political, public and media spaces, these two levers undermine every scientific debate and lead to bad decisions.
</p>
<p>
<a href="http://en.wikipedia.org/wiki/Pascal%27s_wager">Pascal&#8217;s Wager</a> is often invoked, and this is entirely reasonable in this case.
</p>
<p>
It is reasonable because of what&#8217;s at stake: the risk of mass extinction events. If there is a probability that anthropogenic climate change will cause the predicted massive interventions in our ecosystem, then we have to act.
</p>
<p>
The nature of our actions must be commensurate with both the cause and the effect. The causes are many: population, production, consumption &#8211; as are the effects: war, poverty, scarcity, etc.
</p>
<p>
Our interventions will use all our means to address both cause and effect, and those actions will run deep.
</p>
<p>
Equally, we must allow science to do what it&#8217;s designed to do: measure, model, analyse and predict.
</p>
<p>
From a scientific perspective we must allow more room for theories to evolve, otherwise we&#8217;ll only prove what we&#8217;re looking for.
</p>
<p>
However, if we ignore the potential need to act, the consequences are not something anyone will want to see.
</p>
<p>
It&#8217;s not something we can fix later (for me, &#8220;geo-engineering&#8221; is not a fix, it&#8217;s a pre-infected band-aid).
</p>
<p>
Given the massive complexity of the issues, and that &#8211; really &#8211; anthropogenic climate change is only one of many &#8220;peak consumption&#8221; issues that we face, there is no way we can accurately communicate all the arguments that would lead to mass understanding.
</p>
<p>
However, the complexity issues are no different from those we face in politics. They are not solvable, but they are addressable.
</p>
<p>
We can communicate the potential outcomes, and the decisions that individuals need to make in order to impact the causes.
</p>
<p>
Ultimately it&#8217;s your personal choice.
</p>
<p>
My choice is based on my personal exposure to the science, business, data, policy, media, and broader issues around sustainability. That choice is <a href="http://www.dgen.net/blog/index.php/2007/12/12/arctic-could-be-ice-free-in-5-years/">to do my best</a> to catalyse change <a href="http://www.amee.com/">as<br />
fast as I possibly can</a>.
</p>
<p>
We all need to actively engage in improving communication, so that everyone &#8211; potentially everyone on Earth &#8211; can make informed choices about the future of the planet we inhabit.
</p>
<p>
&#8211; <br />
Recommended reading:
</p>
<p>
<a href="http://www.realclimate.org/" target="_blank">http://www.realclimate.org/</a><br />
is a great resource.
</p>
<p>
Today, the UK Government launched <a href="http://www.direct.gov.uk/en/Nl1/Newsroom/DG_174371">a campaign</a> &#8220;to create a more science literate society, highlighting the science and technology based industries of the future&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://radar.oreilly.com/2009/01/climate-polarization.html/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
