The black market for data

General economic theory suggests that supply will meet demand when it is sufficient and demand will consume what is supplied when appropriate value exists. Black markets emerge when there is a disturbance in the force between supply and demand.

Black markets summon thoughts of illicit goods: stolen stereo equipment, weapons, drugs, etc. Rarely do we think about digitized data in a black market. But in today’s world of open social media APIs, there’s a rift between what publishers consider open versus what data consumers are demanding. Most APIs are wrapped in terms of use that users sign off on (“I Agree”) before using the API, and those agreements define how data can be used. Often, those agreements are violated.

The handy gray area

I can see Coke’s logo whenever I like, but I can’t use it however I want. Similarly, companies own user-generated content (UCG) — or at least the means to access it, and/or the conditions under which it can ultimately be displayed. Users, developers and others can see all that information, but it’s not free for the taking. Here-in lies the
conflict and the impetus for black markets dealing in data.

When your business needs a certain amount or type of data that it can’t legitimately have — as dictated by the terms of service (TOS) of the given set of APIs you need it from — a gray area comes in handy. You might not participate in a TOS-violation yourself, but you could pay someone else to do it and hand over the results. Some forms of page scraping,
multi-account/access-key (token), API access aggregating (IP farms), and re-syndication are violations of a services’ TOS. Ask your data providers how they get their data. You may be surprised at what you hear.

Black market data consumption generally occurs behind closed doors, so the economic impact is hard to define and understand. In addition, most data publishers/services simply don’t understand the commercial data market — or its opportunity size — and therefore don’t bother with enforcement of their TOS.

But perspectives change when end-user privacy concerns come to the fore. Publishers engage with full force, taking technical and legal action to shut off various data sources. A recent example: Facebook realized their robots.txt map was promoting page scraping, so they changed the file accordingly and sent cease and desist letters to the offending parties. This interrupted the black market supply of large sets of Facebook data, and various data providers weren’t able to deliver their product to their customers. It had the ripple effect of of a large-scale drug bust.

Defined value will challenge the black market

Scenarios such as these suggest a wild-west attitude pervades the business of data. Publishers feel the need to make data available, and clearly folks want to consume it, yet the use cases around how and when said data wants to be used are still highly variable. Data API publishers and consumers are still trying to understand the value of their data in a highly tumultuous industry, which makes everything feel like a one-off at the moment.

Within the next year I predict more clarity in the API terms that major data sources convey, and a more common understanding of the “rules of the road.” I also believe the value of a services’ underlying data will emerge. We’ll know the value of a Tweet or a
given Facebook status message. Of course, no two messages are alike. A spam message is worthless, whereas a message from a celebrity endorsing a product is highly valuable. Once mechanisms exist for pricing these actions, a formal and granular marketplace can emerge to meet demand, and black markets can dissipate.

Keep data available

Despite black markets and TOS violations, it’s important for
publishers to continue to make their data widely available. Publishers get the public benefit of being labeled as open, as opposed to proprietary. They also effectively outsource many of the hard technical challenges and business models to developers who want to build products based on their data.

Consumers (i.e., companies and developers) that consume a publisher’s data APIs benefit by being able to leverage someone else having solved the hard problem of amassing an interesting, and often large, user base. Consumers are able to repackage and shift the underlying data in a manner that leads to financial gain (and/or application cool-factor fame). Generally the publishers only get a subset of use cases right, and plenty of value is left on the table. Data API consumers exist to identify and take advantage of that value.

End-users are unfortunately stuck in the middle. We benefit from cool applications being built on the raw data we add to a publisher’s product/platform, but we run the risk of being burned by the “I Agree” checkbox. The publisher is in the game to make money, and the content (data) we provided them has inherent behavioral value. The result is experimentation with how that data can be used in a relatively open marketplace. Let’s hope for the best!

Related:

The black market for data

Data providers, developers and end users don't always share the same goals.

The handy gray area

Defined value will challenge the black market

Keep data available

Get the O’Reilly Data Newsletter