- Why YouTube Buffers (ArsTechnica) — When asked if ISPs are degrading Netflix and YouTube traffic to steer users toward their own video services, Crawford told Ars that “the very powerful eyeball networks in the US (and particularly Comcast and Time Warner Cable) have ample incentive and ability to protect the IP services in which they have economic interests. Their real goal, however, is simpler and richer. They have enormous incentives to build a moat around their high-speed data networks and charge for entry because data is a very high-margin (north of 95 percent for the cable companies), addictive, utility product over which they have local monopoly control. They have told Wall Street they will do this. Yes, charging for entry serves the same purposes as discrimination in favor of their own VOD [video-on-demand], but it is a richer and blunter proposition for them.”
- Ink — MIT-licensed interface kit for quick development of web interfaces, simple to use and expand on.
- Licensing in a Post-Copyright World — This article is opening up a bit of the history of Open Source software licensing, how it seems to change and what we could do to improve it. Caught my eye: Oracle that relicensed Berkeley DB from BSD to APGLv3 [… effectively changing] the effective license for 106 other packages to AGPLv3 as well.
- RC Cockroaches (Vine) — video from Dale Dougherty of Backyard Brains Bluetooth RoboRoach. (via Dale Dougherty)
Transit and Peering, Quick Web Interfaces, Open Source Licensing, and RC Roach
Retreading old topics can be a powerful source of epiphany, sometimes more so than simple extra-box thinking. I was a computer science student, of course I knew statistics. But my recent years as a NoSQL (or better stated: distributed systems) junkie have irreparably colored my worldview, filtering every metaphor with a tinge of information management.
Lounging on a half-world plane ride has its benefits, namely, the opportunity to read. Most of my Delta flight from Tel Aviv back home to Portland lacked both wifi and (in my case) a workable laptop power source. So instead, I devoured Nate Silver’s book, The Signal and the Noise. When Nate reintroduced me to the concept of statistical overfit, and relatedly underfit, I could not help but consider these cases in light of the modern problem of distributed data management, namely, operators (you may call these operators DBAs, but please, not to their faces).
When collecting information, be it for a psychological profile of chimp mating rituals, or plotting datapoints in search of the Higgs Boson, the ultimate goal is to find some sort of usable signal, some trend in the data. Not every point is useful, and in fact, any individual could be downright abnormal. This is why we need several points to spot a trend. The world rarely gives us anything clearer than a jumble of anecdotes. But plotted together, occasionally a pattern emerges. This pattern, if repeatable and useful for prediction, becomes a working theory. This is science, and is generally considered a good method for making decisions.
On the other hand, when lacking experience, we tend to over value the experience of others when we assume they have more. This works in straightforward cases, like learning to cook a burger (watch someone make one, copy their process). This isn’t so useful as similarities diverge. Watching someone make a cake won’t tell you much about the process of crafting a burger. Folks like to call this cargo cult behavior.
How Fit are You, Bro?
You need to extract useful information from experience (which I’ll use the math-y sounding word datapoints). Having a collection of datapoints to choose from is useful, but that’s only one part of the process of decision-making. I’m not speaking of a necessarily formal process here, but in the case of database operators, merely a collection of experience. Reality tends to be fairly biased toward facts (despite the desire of many people for this to not be the case). Given enough experience, especially if that experience is factual, we tend to make better and better decisions more inline with reality. That’s pretty much the essence of prediction. Our mushy human brains are more-or-less good at that, at least, better than other animals. It’s why we have computers and Everybody Loves Raymond, and my cat pees in a box.
Imagine you have a sufficient amount of relevant datapoints that you can plot on a chart. Assuming the axes have any relation to each other, and the data is sound, a trend may emerge, such as a line, or some other bounding shape. A signal is relevant data that corresponds to the rules we discover by best fit. Noise is everything else. It’s somewhat circular sounding logic, and it’s really hard to know what is really a signal. This is why science is hard, and so is choosing a proper database. We’re always checking our assumptions, and one solid counter signal can really be disastrous for a model. We may have been wrong all along, missing only enough data. As Einstein famously said in response to the book 100 Authors Against Einstein: “If I were wrong, then one would have been enough!”
Database operators (and programmers forced to play this role) must make predictions all the time, against a seemingly endless series of questions. How much data can I handle? What kind of latency can I expect? How many servers will I need, and how much work to manage them?
So, like all decision making processes, we refer to experience. The problem is, as our industry demands increasing scale, very few people actually have much experience managing giant scale systems. We tend to draw our assumptions from our limited, or biased smaller scale experience, and extrapolate outward. The theories we then tend to concoct are not the optimal fit that we desire, but instead tend to be overfit.
Overfit is when we have a limited amount of data, and overstate its general implications. If we imagine a plot of likely failure scenarios against a limited number of servers, we may be tempted to believe our biggest odds of failure are insufficient RAM, or disk failure. After all, my network has never given me problems, but I sure have lost a hard drive or two. We take these assumptions, which are only somewhat relevant to the realities of scalable systems and divine some rules for ourselves that entirely miss the point.
In a real distributed system, network issues tend to consume most of our interest. Single-server consistency is a solved problem, and most (worthwhile) distributed databases have some sense of built in redundancy (usually replication, the root of all distributed evil).
Driverless Intersections, Quantum Information, Low-Energy Wireless Networking, and Scammy Game Tactics
- Autonomous Intersection Management Project — a scalable, safe, and efficient multiagent framework for managing autonomous vehicles at intersections. (via How Driverless Cars Could Reshape Cities)
- Quantum Information (New Scientist) — a gentle romp through the possible and the actual for those who are new to the subject.
- Ambient Backscatter (PDF) — a new communication primitive where devices communicate by backscattering ambient RF signals. Our design avoids the expensive process of generating radio waves; backscatter communication is orders of magnitude more power-efﬁcient than traditional radio communication. (via Hacker News)
- Top Free-to-Play Monetization Tricks (Gamasutra) — amazingly evil ways that free games lure you into paying. At this point the user must choose to either spend about $1 or lose their rewards, lose their stamina (which they could get back for another $1), and lose their progress. To the brain this is not just a loss of time. If I spend an hour writing a paper and then something happens and my writing gets erased, this is much more painful to me than the loss of an hour. The same type of achievement loss is in effect here. Note that in this model the player could be defeated multiple times in the boss battle and in getting to the boss battle, thus spending several dollars per dungeon.
Spin up Python-friendly services with 0 lines of code
Twisted is a framework for writing, testing, and deploying event-driven clients and servers in Python. In my previous Twisted blog post, we explored an architectural overview of Twisted and examples of simple TCP, UDP, SSL, and HTTP echo servers.
While Twisted makes it easy to build servers in just a few lines of Python, you can actually use Twisted to spin up servers with 0 lines of code!
We can accomplish this with
twistd (pronounced twist-dee), a command line utility that ships with Twisted for deploying your Twisted applications. In addition to providing a standardized deployment interface for common production features like daemonization, logging, and authentication,
twistd can use Twisted’s plugin architecture to run flexible servers for a variety of protocols. Here are some examples:
twistd web --port 8000 --path .
Run an HTTP server on port 8000, serving both static and dynamic content out of
the current working directory. Visit
http://localhost:8000 to see the directory listing:
Doug Hanks on how the MX series is changing the game
Doug Hanks (@douglashanksjr) is an O’Reilly author (Juniper MX Series) and a data center architect at Juniper Networks. He is currently working on one of Juniper’s most popular devices – the MX Series. The MX is a routing device that’s optimized for delivering high-density and high-speed Layer 2 and Layer 3 Ethernet services. As you watch the video interview embedded in this post, the data is more than likely being transmitted across the Juniper MX.
We recently sat down to discuss the MX Series and the opportunities it presents. Highlights from our conversation include:
- MX is one of Juniper’s best-selling platforms [Discussed at the 0:32 mark].
- Learn if the MX can help you [Discussed at the 1:00 mark].
- What you need to know before using the MX [Discussed at the 6:40 mark].
- What’s next for Juniper [Discussed at the 9:39 mark].
You can view the entire interview in the following video.
Learn to build event-driven client and server applications
I want to build a web server, a mail server, a BitTorrent client, a DNS server, or an IRC bot—clients and servers for a custom protocol in Python. And I want them to be cross-platform, RFC-compliant, testable, and deployable in a standardized fashion. What library should I use?
Twisted is a “batteries included” networking engine for writing, testing, and deploying event-driven clients and servers in Python. It comes with off-the-shelf support for popular networking protocols like HTTP, IMAP, IRC, SMTP, POP3, IMAP, DNS, FTP, and more.
To see just how easy it is to write networking services using Twisted, let’s run and discuss a simple Twisted TCP echo server:
from twisted.internet import protocol, reactor
def dataReceived(self, data):
def buildProtocol(self, addr):
With Twisted installed, if we save this code to echoserver.py and run it with python echoserver.py, clients can now connect to the service on port 8000, send it data, and get back their echoed results. Read more…
DevOps is as much about culture as it is about tools.
Operations professionals live in a wind tunnel. If you can imagine one of those game show glass boxes, where a contestant stands inside, the door shuts, and money blows around in a whirlwind, you’ve got a good idea of what Operations feels like much of the time. While you’re trying to grab one technology, another has forced itself across your eyes demanding attention.
The incredible growth of an industry that didn’t really even exist fifteen years ago has provided us with endless opportunity and innovations. It’s also required us to be on the forefront of many new technologies in a way other professions aren’t. The constant drive towards the next technology, the next platform, and the next idea has stratified our organizations, creating specializations in areas like networking, storage, security, data sciences, and a myriad of other functions that challenge our ability to work with our colleagues as a cohesive team.
Establishing an effective organization for large-scale growth
In the open source and free software movement, we always exalt community, and say the people coding and supporting the software are more valuable than the software itself. Few communities have planned and philosophized as much about community-building as ZeroMQ. In the following posting, Pieter Hintjens quotes from his book ZeroMQ, talking about how he designed the community that works on this messaging library.
There are, it has been said (at least by people reading this sentence out loud), two ways to make really large-scale software. Option One is to throw massive amounts of money and problems at empires of smart people, and hope that what emerges is not yet another career killer. If you’re very lucky and are building on lots of experience, have kept your teams solid, and are not aiming for technical brilliance, and are furthermore incredibly lucky, it works.
But gambling with hundreds of millions of others’ money isn’t for everyone. For the rest of us who want to build large-scale software, there’s Option Two, which is open source, and more specifically, free software. If you’re asking how the choice of software license is relevant to the scale of the software you build, that’s the right question.
The brilliant and visionary Eben Moglen once said, roughly, that a free software license is the contract on which a community builds. When I heard this, about ten years ago, the idea came to me—Can we deliberately grow free software communities?
Informed Citizenry, TCP Chaos Monkey, Photographic Forensics, Medical Trial Data
- Aaron’s Army — powerful words from Carl Malamud. Aaron was part of an army of citizens that believes democracy only works when the citizenry are informed, when we know about our rights—and our obligations. An army that believes we must make justice and knowledge available to all—not just the well born or those that have grabbed the reigns of power—so that we may govern ourselves more wisely.
- Vaurien the Chaos TCP Monkey — a project at Netflix to enhance the infrastructure tolerance. The Chaos Monkey will randomly shut down some servers or block some network connections, and the system is supposed to survive to these events. It’s a way to verify the high availability and tolerance of the system. (via Pete Warden)
- Foto Forensics — tool which uses image processing algorithms to help you identify doctoring in images. The creator’s deconstruction of Victoria’s Secret catalogue model photos is impressive. (via Nelson Minar)
- All Trials Registered — Ben Goldacre steps up his campaign to ensure trial data is reported and used accurately. I’m astonished that there are people who would withhold data, obfuscate results, or opt out of the system entirely, let alone that those people would vigorously assert that they are, in fact, professional scientists.
A letter asking for an introduction meets a meditation on self-reliance.
In 1905 Mark Twain wrestled with the sort of request that many readers here have undoubtedly encountered: a new writer with the most tenuous of connections (her uncle was briefly a neighbor in a Nevada mining town) asks Twain to use his influence to get her manuscript published.
It never hurts to carry an introduction from a well-regarded intermediary, as long as your introducer can actually speak to the quality of your work. I think of Twain’s anguished reply every time I’m asked to recommend someone or something I don’t know — or am tempted to ask the same favor of someone else.
Twain’s message is ultimately optimistic: don’t simply try to accumulate influence. Instead, come up with a good idea and sell it on its merits. The world will listen.