How did we end up with a centralized Internet for the NSA to mine?

The Internet is naturally decentralized, but it's distorted by business considerations.

I’m sure it was a Wired editor, and not the author Steven Levy, who assigned the title “How the NSA Almost Killed the Internet” to yesterday’s fine article about the pressures on large social networking sites. Whoever chose the title, it’s justifiably grandiose because to many people, yes, companies such as Facebook and Google constitute what they know as the Internet. (The article also discusses threats to divide the Internet infrastructure into national segments, which I’ll touch on later.)

So my question today is: How did we get such industry concentration? Why is a network famously based on distributed processing, routing, and peer connections characterized now by a few choke points that the NSA can skim at its leisure?

I commented as far back as 2006 that industry concentration makes surveillance easier. I pointed out then that the NSA could elicit a level of cooperation (and secrecy) from the likes of Verizon and AT&T that it would never get in the US of the 1990s, where Internet service was provided by thousands of mom-and-pop operations like Brett Glass’s wireless service in Laramie, Wyoming. Things are even more concentrated now, in services if not infrastructure.

Having lived through the Boston Marathon bombing, I understand what the NSA claims to be fighting, and I am willing to seek some compromise between their needs for spooking and the protections of the Fourth Amendment to the US Constitution. But as many people have pointed out, the dangers of centralized data storage go beyond the NSA. Bruce Schneier just published a pretty comprehensive look at how weak privacy leads to a weakened society. Others jeer that if social networking companies weren’t forced to give governments data, they’d be doing just as much snooping on their own to raise the click rates on advertising. And perhaps our most precious, closely held data — personal health information — is constantly subject to a marketplace for data mining.

Let’s look at the elements that make up the various layers of hardware and software we refer to casually as the Internet. How does centralization and decentralization work for each?

Public routers

One of Snowden’s major leaks reveals that the NSA pulled a trick comparable to the Great Firewall of China, tracking traffic as it passes through major routers across national borders. Like many countries that censor traffic, in other words, the NSA capitalized on the centralization of international traffic.

Internet routing within the US has gotten more concentrated over the years. There were always different “tiers” of providers, who all did basically the same thing but at inequitable prices. Small providers always complained about the fees extracted by Tier 1 networks. A Tier 1 network can transmit its own traffic nearly anywhere it needs to go for just the cost of equipment, electricity, etc., while extracting profit from smaller networks that need its transport. So concentration in the routing industry is a classic economy of scale.

International routers, of the type targeted by the NSA and many US governments, are even more concentrated. African and Latin American ISPs historically complained about having to go through US or European routers even if the traffic just came back to their same continent. (See, for instance, section IV of this research paper.) This raised the costs of Internet use in developing countries.

The reliance of developing countries on outside routers stems from another simple economic truth: there are more routers in affluent countries for the same reason there are more shopping malls or hospitals in affluent countries. Foreigners who have trespassed US laws can be caught if they dare to visit a shopping mall or hospital in the US. By the same token, their traffic can be grabbed by the NSA as it travels to a router in the US, or one of the other countries where the NSA has established a foothold. It doesn’t help that the most common method of choosing routes, the Border Gateway Protocol (BGP), is a very old Internet standard with no concept of built-in security.

The solution is economic: more international routers to offload traffic from the MAE-Wests and MAE-Easts of the world. While opposing suggestions to “balkanize” the Internet, we can applaud efforts to increase connectivity through more routers and peering.

IaaS cloud computing

Centralization has taken place at another level of the Internet: storage and computing. Data is theoretically safe from intruders in the cloud so long as encryption is used both in storage and during transmission — but of course, the NSA thought of that problem long ago, just as they thought of everything. So use encryption, but don’t depend on it.

Movement to the cloud is irreversible, so the question to ask is how free and decentralized the cloud can be. Private networks can be built on virtualization solutions such as the proprietary VMware and Azure or the open source OpenStack and Eucalyptus. The more providers there are, the harder it will be to do massive data collection.

SaaS cloud computing

The biggest change — what I might even term the biggest distortion — in the Internet over the past couple decades has been the centralization of content. Ironically, more and more content is being produced by individuals and small Internet users, but it is stored on commercial services, where it forms a tempting target for corporate advertisers and malicious intruders alike. Some people have seriously suggested that we treat the major Internet providers as public utilities (which would make them pretty big white elephants to unload when the next big thing comes along).

This was not technologically inevitable. Attempts at peer-to-peer social networking go back to the late 1990s with Jabber (now the widely used XMPP standard), which promised a distributed version of the leading Internet communications medium of the time: instant messaging. Diaspora more recently revived the idea in the context of Facebook-style social networking.

These services allow many independent people to maintain servers, offering the service in question to clients while connecting where necessary. Such an architecture could improve overall reliability because the failure of an individual server would be noticed only by people trying to communicate with it. The architecture would also be pretty snoop-proof, too.

Why hasn’t the decentralized model taken off? I blame SaaS. The epoch of concentration in social media coincides with the shift of attention from free software to SaaS as a way of delivering software. SaaS makes it easier to form a business around software (while the companies can still contribute to free software). So developers have moved to SaaS-based businesses and built new DevOps development and deployment practices around that model.

To be sure, in the age of the web browser, accessing a SaaS service is easier than fussing with free software. To champion distributed architectures such as Jabber and Diaspora, free software developers will have to invest as much effort into the deployment of individual servers as SaaS developers have invested in their models. Business models don’t seem to support that investment. Perhaps a concern for privacy will.

tags: , , , , , , , , , , , , , ,

Get the O’Reilly Hardware Newsletter

Get weekly insight and knowledge on how to design, prototype, manufacture, and market great connected devices.

  • To answer the question somewhat over-literally, we haven’t policed the companies that have monopolies on the network hardware. We don’t have a free market!

    Imagine a farmer’s market, the kind with temporary stalls in a parking-lot behind the town hall every Saturday. As it’s owned by the town, we get a town policeman hanging out and noshing while he keeps an eye on everything. Farmers pay by the square foot, and compete for the front row by trying to arrive earlier than one another, and the policeman mediates and disagreements, as he’s unbiased. This is the model for what we now call a “free market”.

    Now consider what happens if the only good lot is owned by Dr Evil, a local landowner. The farmers get the locations and sizes that Dr Evil decides on, and they can get better ones if they pay more. If they pay a lot, they can get the Doctor to ban certain competitors. The biggest stall in the front row is owned by Dr Evil’s son, Scott. He runs a chicken farm, and no-one else is allowed to sell eggs in the lot. The town policeman isn’t welcome on Dr Evil’s property, and can’t mediate small things. He can stop armed robberies, but he can’t stop Dr Evil’s quiet, mostly-legal form of crime. Dr Evil has cornered this market.

    For “landowner”, substitute in “monopoly owing phone or cable poles”. For Dr Evil, substitute the names of a local cable or phone company. For Scott Evil, substitute in the name of their wholly-owned ISP.

    We should have a free market in the ISP business, but because certain companies have a monopoly on the physical networks, they can corner the market, and offer better service to customers who are willing to bribe them. All publicly, and all almost legally, because we didn’t think we needed laws to prevent the problem. Or didn’t enforce the laws we do have!

  • Fitzgerald

    I thought the short answer was “they *created* the internet”

  • Andy Oram

    Thanks for the comments.

    I should have added to my article a nod toward a real technical problem that makes peer-to-peer communications hard: addressing.

    For technical as well as business reasons, most individuals reside behind corporate routers at the ISP or another institution and get temporary IP addresses. Current work-arounds involve a central server for connection, followed by peer-to-peer data exchange. This is not robust.

    IPv6 does not automatically solve this. First, it doesn’t solve mobility (where I log in from Boston one day and Tel Aviv the next). Second, exposing individual systems to the big bad Internet increases the risk of security breaches, although one can’t say that NATs adequately protect us now.

    Still, it’s a big technical job to get everybody on the Internet (really on the Internet, not just hanging off a hub on a LAN somewhere), but it will be necessary before we distribute services.

  • I agree with a lot of the content in this article, but I think it misses some important fundamental reasons, and focuses on the higher level symptoms. Here’s the 3 reasons I see for our centralized internet, which is, IMO, a local-maximum optimization that we need to rethink and reverse:

    1. Billing: the reason things are centralized and not distributed is because the more peer-to-peer it is, the less any company can regulate, monitor, and ultimately, bill for that service. In a sense, if the web were completely peer-to-peer decentralized, almost no one (except maybe the ISPs) could make money on the web, which would of course collapse the web.

    2. Discoverability: we use centralization of services to solve the fact that my phone and your phone don’t know about each other (yet), and so the best we’ve come up with (yet) is for these two to be (at best) introduced via a centralized service, (at worst) proxied via a centralized service.

    3. Asynchronicity: If my phone and your phone are “connected” at the same time, there’s no technical reason they shouldn’t be able to find a route between each other and speak directly, no middle-man. But what if you’re not “online” when I want to send you a message? There’s only two options here, and we picked one. Either we have a centralized service to “proxy” my message and hold it until you come “online”, or we use the peer-net as a distributed “proxy” such that messages just keep bouncing around all the nodes until you come “online”.

    These are really difficult challenges. There’s a reason we ended up at the local-maximum optimization of centralized infrastructure. But I’ve founded a company that’s exploring how we can push back and solve these 3 fundamental problems in a non-centralized way. I think it’s possible, but it’s going to take a LOT of reworking. :)

  • Ric Woods

    “Movement to the cloud is irreversible”
    Gotta disagree here. As a general statement about a trend, possibly but for individual users and companies, no way. Right now, no one NEEDS to use the cloud for their IT or personal computing needs. The cost and convenience incentives are there but the security and data ownership are definitely not.

  • American Patriot

    The ‘cloud’ is far too easy to break into, and steal private data. The NSA has no morals when it comes to spying and violating the rights of the people, they assume they are ‘the’ law, and have no constitutional prohibitions, but the people do.
    Government has become the largest terror organization, and has taken it upon themselves to assume absolute power where they have none at all!
    The arrogance is mind numbing, they continue to lie to the people, fabricate stories to back up those lies, and when we complain, they add more lies and obfuscation to confuse us. Well, we see THROUGH those lies better than ever, and the day of absolute accountability is fast approaching, WE THE PEOPLE ARE WATCHING you VERY CLOSELY! Storing ANY private information on a remote server is foolish, and you should expect to have your information spied upon. Trust nobody but yourself here, store your private data in external drives YOU control, use USB flash drives YOU keep in YOUR possession, not on some server in a location you don’t know about. If you can’t take physical control of your data, YOU do NOT own it, someone else does!

  • Andy Oram

    It’s an interesting question whether the use of cloud storage and computing resources is becoming inevitable. I think it is, for cost and convenience reasons. Most analysts I’ve read agree. And to American Patriot and Ric Woods: the experts maintaining data centers are much better at maintaining security than nearly all individual computer users. But I have heard an interesting counter-argument: that as memory and processors shrink, we’ll all have so much computing power in our pockets that we have no need for the cloud. By the way, individuals can run servers in the cloud too, so IaaS and co-exist with decentralization.

    getify: Thanks for the thoughtful comments on deeper causes. I investigated the problems of decentralization back when peer-to-peer became a popular term in the early 2000s, and wrote two articles on the problems: