That’s it — I’m taking my data and going home

We are simply not good at playing with others when it comes to data

Russia’s railway gauge is different from Western Europe’s. At the border of the former Soviet states, the Russian gauge of 1.524m meets the European & American ‘Standard’ gauge of 1.435m. The reasons for this literal disconnect arise from discussions between the Tsar and his War Minister. When asked the most effective way to prevent Russia’s own rail lines being used against them in times of invasion, the Minister suggested a different gauge to prevent supply trains rolling through the border. The artifact of this decision remains visible today at all rail crossings between Poland and Belarus or Slovakia and Ukraine. The rail cars are jacked up at the border, new wheels inserted underneath, and the car lowered again. It is about a 2-4 hour time burn for each crossing.

Per head, per crossing, over 170 years, is a heck of a lot of resource wasted. But to change it would entail changing the rail stock of the entire country and realigning about 225,000 km (140,000 mi) of track.

Talk about technical debt.

Data suffers from a similar disconnect. It really wasn’t until the advent of XML 15 years ago that we had an agreed (but not entirely satisfactory) mechanism for storing arbitrary data structures outside the application layer. This is as much a commentary on our technical priorities as it is a social indictment. We are simply not good at playing with others when it comes to data.

Think otherwise? Cast back to last year’s tit-for-tat shortly after Facebook bought Instagram: Twitter responded by disabling Instagram’s ability to import their friends from Twitter, followed by Instagram’s disabling their Twitter cards integration, not as retaliation, of course, but rather because viewing images on Instagram itself is “a better experience” for consumers. A poor excuse really; this was a battle over users-as-data. Most commonly in these battles, the consumer pays the price.

Corporate attitudes towards data are still very focused on this idea of data as proprietary value.  My data is mine  and yours is yours (even if it was created by our users). The Promethean gift of big data tools has not lessened this attitude.  Companies have now the means to handle all the data they gather and keep it even closer to their bosom, and keep the rail gauges disconnected.

This attitude remains prominent but is fundamentally atavistic. The fact is, data is not natively a zero sum game. At Factual we manage data for 65 million small businesses and landmarks in 50 countries. The size and fluidity of this dataset demands the combined attentions of multiple organizations, so we’ve designed technical connectors to help with contributions, diff streams to broadcast edits, and a legal framework that ensures that if you improve our data, we share ownership. At Strata¬†next week,¬†I will be going into detail about why we think a shared approach is the only approach here, and why it is bad principle to keep the passengers waiting.

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.