Nat Torkington

Nat Torkington

Nat has chaired the O'Reilly Open Source Convention and other O'Reilly conferences for over a decade. He ran the first web server in New Zealand, co-wrote the best-selling Perl Cookbook, and was one of the founding Radar bloggers. He lives in New Zealand and consults in the Asia-Pacific region.

 

Tue

Feb 9
2010

Four short links: 9 February 2010

Government Dashboard, Science Code Errors, Scaling Online Games, Information Theory

by Nat Torkington@gnatcomments: 0

  1. Track DC -- informative drill-down report from Washington DC government about the different departments. (via Sunlight Labs blog)
  2. Errors in Scientific Software -- a 1994 study of scientific software that found inconsistent interfaces (1 in 7 for Fortran, 1 in 37 for C) and poor use of arithmetic such that significant figures declined from 6sf in the data to 1sf in the result. (via "If you're going to do good science, release the computer code too" in the Guardian)
  3. How Farmville Scales -- 75M players/month (28M/day), 1/4 of disk activity is writes, 50% higher load spikes, 3G/s traffic go between Farmville and Facebook at peak, LAMP stack, nagios+munin+puppet. (via Hacker News)
  4. Mathematical Philology -- when two manuscripts of the same text differ, which is correct? This PLoSONE paper looked at all such discrepancies in Lucretius's De Rerum Natura and found that the traditional principle of choosing the more difficult reading (on the grounds that errors are from humans unconsciously simplifying) has a strong information theory justification for it. Interesting to see this less than a week after an MIT Technology Review article on quantum teleportation remarked, There is a growing sense that the properties of the universe are best described not by the laws that govern matter but by the laws that govern information.

tags: cs, facebook, games, gov2.0, programming, scale, sciencecomments: 0
submit: Reddit Digg stumbleupon   

 

Mon

Feb 8
2010

Four short links: 8 February 2010

Kindle SDK, Javascript eBook Reader, Peer Review Review, eBook Moments

by Nat Torkington@gnatcomments: 0

  1. Kindle Development Kit APIs -- Amazon will release a Kindle SDK. These are the API docs. (via obra on Twitter)
  2. rePublish -- all-Javascript ebook reader. (via kellan on Twitter)
  3. Peer Review: What's it Good For? (Cameron Neylon) -- harsh and honest review of peer review with some important questions for the future of science. But there is perhaps an even more important procedural issue around peer review. Whatever value it might have we largely throw away. Few journals make referee’s reports available, virtually none track the changes made in response to referee’s comments enabling a reader to make their own judgement as to whether a paper was improved or made worse. Referees get no public credit for good work, and no public opprobrium for poor or even malicious work. And in most cases a paper rejected from one journal starts completely afresh when submitted to a new journal, the work of the previous referees simply thrown out of the window. Some lessons in here for social software, too.
  4. Analog IMDB -- The transition is moving slowly, but it’s moving. It’s a fascinating thing to watch. The technology is the dull part: what’s interesting is the shift in perception. You know how sometimes you turn off a certain section of your brain and force yourself to see a word not as a piece of language with meaning, but as a sequence of black shapes and white spaces? It’s like you’re seeing that image for the very first time and suddenly “bird” seems like a very odd thing. I’ve been buying all of my in-print books electronically for a couple of years. Physical books aren’t weird to me yet. But damn, that old copy of the Maltin guide was a freaky and bizarre object. It’s the first time I looked at a book and didn’t see a container for information. I saw dead wood.

tags: amazon kindle, ebooks, javascript, opensource, programming, science, social softwarecomments: 0
submit: Reddit Digg stumbleupon   

 

Fri

Feb 5
2010

Four short links: 5 February 2010

Public Domain, Science Code, Bad Crypto, Javascript Grids

by Nat Torkington@gnatcomments: 0

  1. The Public Domain Manifesto -- eloquent argument in favour of the public domain. (via BoingBoing)
  2. Clear Climate Code -- project to write and maintain software for climate science, with an emphasis on clarity and correctness. What a wonderful way for coders who aren't scientists to contribute to open and better science. (via the interesting OKFN blog)
  3. Don't Hash Secrets -- One area of secure protocol development that seems to consistently yield poor design choices is the use of hash functions. What I’m going to say is not 100% correct, but it is on the conservative side of correct, so if you follow the rule, you (probably) can’t go wrong. You might be considered overly paranoid, but as they say, just because you’re paranoid doesn’t mean they’re not after you. So here it is: Don’t hash secrets. Never. No, sorry, I know you think your case is special but it’s not. No. Stop it. Just don’t do it. You’re making the cryptographers cry.
  4. Javascript Grid Editors -- nice wrapup of available Javascript editable grid components, divided into "data driven", "light edit", and "spreadsheet". (via joshua on Delicious)

tags: copyright, cryptography, javascript, open source, programming, science, securitycomments: 0
submit: Reddit Digg stumbleupon   

 

Thu

Feb 4
2010

Four short links: 4 February 2010

Personal Ad Preferences, Android Kernel, EC2 Deconstructed, Symbian Opened

by Nat Torkington@gnatcomments: 0

  1. Google Ad Preferences -- my defaults look reasonable and tailored to my interest. Creepy but kinda cool: I guess that if I have to have ads, they should be ones I'm not going to hate. (via rabble on Twitter)
  2. Android and the Linux Kernel -- the Android kernel is forked from the standard Linux kernel, and a Linux kernel maintainer says that Google has made no efforts to integrate. (via Slashdot)
  3. On Amazon EC2's Underlying Architecture -- fascinating deconstruction of the EC2 physical and virtual servers, without resorting to breaking NDAs. (via Hacker News)
  4. First Full Open Source Symbian Release (BBC) -- source code will be available for download from the Symbian Foundation web site as of 1400GMT. Nokia bought Symbian for US$410M in 2008 (for comparison, AOL bought Netscape for $4.2B in 1999 but the source code tarball had been escape-podded from the company a year before the deal closed). This makes Symbian more open than Android, says the head of the foundation: "About a third of the Android code base is open and nothing more,” says Williams. “And what is open is a collection of middleware. Everything else is closed or proprietary.” (quote from Wired's story).

tags: advertising, amazon ec2, android, google, linux, nokia, open source, search, symbiancomments: 0
submit: Reddit Digg stumbleupon   

 

Wed

Feb 3
2010

Four short links: 3 February 2010

Bad Census Data, Telephone Fraud, Math Art, and EBook Bugs

by Nat Torkington@gnatcomments: 0

  1. Bad Census Data for The Last Decade (Freakonomics blog) -- the "representative sample" of statistics data that the Census Bureau releases has apparently been flawed. It's been used in thousands of studies, and the Census Bureau has refused to correct it.
  2. Modern Telephone Fraud -- it's actually an old fraud updated: an insecure digital PBX used to route expensive calls. Innocent company is whacked with bill at end of month. Interesting questions raised about what we expect company to do (pay?) and telco to do (forgive?). It's a good reminder that every electronic product is now an avenue for fraud or intrusion, but we don't plan or contract for these situations.
  3. Found Functions -- Nikki Graziano adds mathematics to photographs. Her photos let me see the world through a mathematician's eyes. (via sciblogs)
  4. Getting Past Good-Enough E-Books -- fantastic list of TODOs for ebook publishers.

tags: art, data, ebooks, hardware, math, security, voicecomments: 0
submit: Reddit Digg stumbleupon   

 

Tue

Feb 2
2010

Rethinking Open Data

Lessons learned from the Open Data front lines

by Nat Torkington@gnatcomments: 24

In the last year I've been involved in two open data projects, Open New Zealand and data.govt.nz. I believe in learning from experience and I've seen some signs recently that other projects might benefit from my experience, so this post is a recap of what I've learned. It's the byproduct of a summer reflection on my last nine months working in open data.

Technologists like to focus on technology, and I'm as guilty of that as the next person. When Open New Zealand started, we rushed straight to the "catalogue". I was part of a smart group of top-notch web hackers--we know what a catalogue is, it's a web-based database and let's figure out the UI flow and which fields do we want and hey I can hack one up in Wordpress and I'll work on the hosting and so on. We spent more time worrying about CSS than we did worrying about the users.

This is the exact analogue of an open source software failure mode: often companies think they can get all the benefits of open source simply by releasing their source code. The best dinner parties are about the other people. Similarly, the best open source projects have great people, attract great people, and the source is simply what they're working on: necessary but not sufficient. You can build it but they won't come. All successful open source projects build communities of supportive engaged developers who identify with the project and keep it productive and useful.

Data catalogues around the world have launched and then realised that they now have to build a community of data users. There's value locked up in government data, but you only realise that value when the datasets are used. Once you finish the catalogue, you have to market it so that people know it exists. Not just random Internet developers, but everyone who can unlock that value. This category, "people who can use open data in their jobs" includes researchers, startups, established businesses, other government departments, and (yes) random Internet hackers, but the category doesn't have a name and it doesn't have a Facebook group, newsletter, AGM, or any other way for you to reach them easily.

This matters because it costs money to make existing data open. That sounds like an excuse, and it's often used as one, but underneath is a very real problem: existing procedures and datasets aren't created, managed, or distributed in an open fashion. This means that the data's probably incomplete, the document's not great, the systems it lives on are built for internal use only, and there's no formal process around managing and distributing updates. It costs money and time to figure out the new processes, build or buy the new systems, and train the staff.

In particular, government and science are often funded as projects. When the project ends, the funding stops. Ongoing maintenance and distribution of the data hasn't been budgeted for almost all the data sets we have today. This attitude has to change, and new projects give us the chance to get it right, but most existing datasets are unfunded for maintenance and release.

So while opening all data might be The Right Thing To Do from a philosophical perspective, it's going to cost money. Governments would rather identify the high-value datasets, where great public policy comment, intra-government optimisation, citizen information, or commercial value can be unlocked. Even if you don't buy into the cost argument, there's definitely an order problem: which datasets should we open first? It should be the ones that will give society the greatest benefit soonest. But without a community of users to poll, a well-known place for would-be data consumers to come to and demand access to the data they need, the policy-making parts of governments are largely blind to what data they have and what people want.

That's not to say that data catalogues aren't useful. We were scratching an itch--we wanted easier access to government data, so we built the tool that would provide it. The community of data users can be built around the tool. As Krishna was told by Arjuna, "a man must go forth from where he stands. He cannot jump to the Absolute, he must evolve toward it". I'm just noting that, as with all creative endeavours, we learned about the problem by starting to fix it.

Which brings me to the second big lesson: which problem are we trying to solve? There's an Open Data movement emerging around governments releasing data. However, there are at least five different types of Open Data groupie: low-polling governments who want to see a PR win from opening their data, transparency advocates who want a more efficient and honest government, citizen advocates who want services and information to make their lives better, open advocates who believe that governments act for the people therefore government data should be available for free to the people, and wonks who are hoping that releasing datasets of public toilets will deliver the same economic benefits to the country as did opening the TIGER geo/census dataset.

The one thing these groups don't share is an outcome. I can imagine an honest government where the costs of transparency overweigh the costs of corruption (think of the cost of removing every dirt particle from your house). I can imagine PR wins that don't come from delivering real benefits to citizens, in fact I see this in a recent tweet by Sunlight Labs's Ellen Miller:

Most of the raw data released by the OGD most likely isn't for you to use.
She's grumbling, as does this Washington Post piece, about the results so far from the Open Government Directive, which has prompted datasets of questionable value to be added to data.gov. If this is the future, where's my flying car? If this is open data, where's my damn transparency?

There are some promising signs. The UK government data catalogue had a long beta period where developers were working with the data. The UK team built a community as well as a catalogue. That's not to say that the UK effort is all gold--I saw plenty of frustration with RDF while I was observing the developers--but it stands out simply for the acknowledgement of users. Similarly, the UK's MySociety defined what success is to them: they're all about building useful apps for citizens, and open data is a means not an end to them.

So, after nearly a year in the Open Data trenches, I have some advice for those starting or involved in open data projects. First, figure out what you want the world to look like and why. It might be a lack of corruption, it might be a better society for citizens, it might be economic gain. Whatever your goal, you'll be better able to decide what to work on and learn from your experiences if you know what you're trying to accomplish. Second, build your project around users. In my time working with the politicians and civil servants, I've realised that success breeds success: the best way to convince them to open data is to show an open data project that's useful to real people. Not a catalogue or similar tool aimed at insiders, but something that's making citizens, voters, constituents happy. Then they'll get it.

My next project with Open New Zealand is to build a community of data users. I want to see users supporting each other, I want to build a tight feedback loop between those who want data and those who can provide it, to create an environment where the data users can support each other, and to make it easier to assess the value created by government-released open data. Henry Kissinger said, "each success only buys admission to a more difficult problem". I look forward to learning what the next problem is.

tags: gov2.0, mysociety, open data, open government, sunlight labs, transparencycomments: 24
submit: Reddit Digg stumbleupon   

 

Tue

Feb 2
2010

Four short links: 2 February 2010

Physical UIs, Code Visualization, Money Money Money, and Educational Screencasts

by Nat Torkington@gnatcomments: 1

  1. Phones That Touch Us (TEDxBerlin) -- excellent short (<5m) talk about ways that mobile phones can be designed to convey information in new ways. (via RussB on Twitter)
  2. Code City -- an integrated environment for software analysis, in which software systems are visualized as interactive, navigable 3D cities. The classes are represented as buildings in the city, while the packages are depicted as the districts in which the buildings reside. The visible properties of the city artifacts depict a set of chosen software metrics. (via mikeloukides on Twitter)
  3. Subscriptions Are the New Black (Dave McClure) -- high-octane rant that boils down to "deliver a good product, charge a fair price". Nobody tell 37Signals, they'll be pissed to discover they've been on the wrong track for all this time. Oh wait ...
  4. Khan Academy -- not a Star Trek spinoff but a collection of easy-to-understand science, maths, and economics instructional YouTube screencasts. (via Jon Udell)

tags: business, education, mobile, programming, screencasts, ui, visualizationcomments: 1
submit: Reddit Digg stumbleupon   

 

Mon

Feb 1
2010

Four short links: 1 February 2010

Android Charting, Trojan Cameras, Web-based IDE, Projected UIs

by Nat Torkington@gnatcomments: 3

  1. Chartdroid -- an open source charting library for Android.
  2. China Bugs and Burgles Britain -- The gifts — cameras and memory sticks — have been found to contain electronic Trojan bugs which provide the Chinese with remote access to users’ computers. Beware geeks bearing gifts.
  3. Bespin -- sexy HTML5 "code-in-the-cloud" IDE from Mozilla Labs. If the future is truly in locked-down hack-free devices whose only interface to the world is through the web browser, these sorts of IDEs are going to become critical for finding and raising the next generation of hackers.
  4. Light Blue Optics' Light Touch turns any surface into a color touchscreen display (Engadget) -- projects a UI and a built-in camera picks up your interactions with it.

tags: android, html, html5, opensource, programming, security, ui, web, xhtmlcomments: 3
submit: Reddit Digg stumbleupon   

 

Fri

Jan 29
2010

Four short links: 29 January 2010

Chat Roulette, Flickr Photo Found, Life Quantification, Infographic Skills

by Nat Torkington@gnatcomments: 0

  1. Chat Roulette -- not sure it's new, as I think I recall Eric Ries talking about implementing it in the early days of IMVU, but it's still interesting: chat to a random person who also wants to chat. I wonder whether it's being used for drive-by phone sex, or whether there's a genuine curiosity about other human beings that extends beyond their genitals. (via Roger Dennis)
  2. Only Surviving Photo of Phineas Gage Found on Flickr (NPR) -- are we still surprised at this? It's a little like "last copy of book found in library". Great photo, though. (via wiselark on Twitter)
  3. The 2009 Feltron Report -- life quantified beautifully. (via Flowing Data)
  4. Chart Wars: The Political Power of Visualization (Ignite) -- how to be a smart consumer of datagraphics and visualizations. (via KathySierra on Twitter)

tags: flickr, lifehacks, social software, visualizationcomments: 0
submit: Reddit Digg stumbleupon   

 

Thu

Jan 28
2010

Four short links: 28 January 2010

ISP Lockin, Warped Priorities, Government Data, and Book Piracy

by Nat Torkington@gnatcomments: 4

  1. TrueSwitch -- "the de facto proprietary API that all the big ISPs use to help users switch, a market opportunity that wouldn't exist if they just opened up access to each other" in the words of Pete Warden.
  2. Free Publicity: Who Do We Help? (Anil Dash) -- I love cool stuff as much as the next guy. What leaves me at a loss, though, is how many otherwise sane and sensible people give their time and energy freely to help support a company like Apple that, despite its elegant designs and generally excellent products (I use many of them), certainly doesn't need free PR from some of the most talented people on the web.
  3. World Government Data -- the Guardian build a meta-index to open government data from four countries and will add more as other countries build data.gov-like sites.
  4. Confessions of a Book Pirate -- lots of insights into how guerilla book piracy happens. The scanning process takes about 1 hour per 100 scans. Mass market paperbacks can be scanned two pages at a time flat on the scanner bed, while large trades and hardcovers usually need to be scanned one page at a time. I’m sure that some of the more hardcore scanners disassemble the book and run it through an automatic feeder or something, but I prefer the manual approach because I’d like to save the book, and don’t want to invest in the tools. Usually I can scan a book while watching a movie or two. (via waxy)

tags: apple, book, business, government, open data, piracycomments: 4
submit: Reddit Digg stumbleupon   

 

Recent Posts

 

NAT'S TWITTER UPDATES

RELEASE 2.0

CURRENT CONFERENCES

  1. O'Reilly Tools of Change for Publishing Conference, February 22 - 24, 2010, New York, NY