- Android Malware Numbers — (Quartz) less than an estimated 0.001% of app installations on Android are able to evade the system’s multi-layered defenses and cause harm to users, based on Google’s analysis of 1.5B downloads and installs.
- Facebook Operations Chief Reveals Open Networking Plan — long interview about OCP’s network project. The specification that we are working on is essentially a switch that behaves like compute. It starts up, it has a BIOS environment to do its diagnostics and testing, and then it will look for an executable and go find an operating system. You point it to an operating system and that tells it how it will behave and what it is going to run. In that model, you can run traditional network operating systems, or you can run Linux-style implementations, you can run OpenFlow if you want. And on top of that, you can build your protocol sets and applications.
- How Red Bull Dominates F1 (Quartz) — answer: data, and lots of it.
- Ground-Level Air Pollution Sensor (Make) — neat sensor project from Make.
Toward unifying customer behavior and operations metrics.
For the last ten years I’ve had a foot in both the development and operations worlds. I stumbled into the world of IT operations as a result of having the most UNIX skills in the team shortly after starting at ThoughtWorks. I was fortunate enough to do so at a time when many of my ThoughtWorks colleagues and I where working on the ideas which were captured so well in Jez Humble and Dave Farley’s Continuous Delivery (Addison-Wesley).
During this time, our focus was on getting our application into production as quickly as possible. We were butting up against the limits of infrastructure automation and IaaS providers like Amazon were only in their earliest form.
Recently, I have spent time with operations teams who are most concerned with the longer-term challenges of looking after increasingly complex ecosystems of systems. Here the focus is on immediate feedback and knowing if they need to take action. At a certain scale, complex IT ecosystems can seem to exhibit emergent behavior, like an organism. The operations world has evolved a series of tools which allow these teams to see what’s happening *right now* so we can react, keep things running, and keep people happy.
At the same time, those of us who spend time thinking about how to quickly and effectively release our applications have become preoccupied with wanting to know if that software does what our customers want once it gets released. The Lean Startup movement has shown us the importance of putting our software in front of our customers, then working out how they actually use it so we can determine what to do next. In this world, I was struck by the shortcomings of the tools in this space. Commonly used web analytics tools, for example, might only help me understand tomorrow how my customers used my site today.
Bluetooth networking within the Internet of Things
This article is part of a series exploring the role of networking in the Internet of Things.
Previously, we set out to choose the wireless technology standard that best fits the needs of our hypothetical building monitoring and energy application. Going forward, we will look at candidate technologies within all three networking topologies discussed earlier: point-to-point, star, and mesh. We’ll start with Bluetooth, the focus of this post.
Bluetooth is the most common wireless point-to-point networking standard, designed for exchanging data over short distances. It was developed to replace the cables connecting portable and/or fixed devices.
Today, Bluetooth is well suited for relatively simple applications where two devices need to connect with minimal configuration setup, like a button press, as in a cell phone headset. The technology is used to transfer information between two devices that are near each other in low-bandwidth situations such as with tablets, media players, robotics systems, handheld and console gaming equipment, and some high-definition headsets, modems, and watches.
When considering Bluetooth for use in our building application, we must consider the capabilities of the technology and compare these capabilities to the nine application attributes outlined in my previous post. Let’s take a closer look at Bluetooth across these eight key attributes.
Brace yourself: Address exhaustion is coming
IPv6 is the global warming of the computer industry, an impending disaster that most folks don’t seem to be taking as seriously as they should be. We’re well into the exhaustion phase of the IPv4 address space, but most ISPs are still dragging their heels on supplying the wider protocol to the end user.
Android Malware Numbers, Open Networking Hardware, Winning with Data, and DIY Pollution Sensor
DEFCON Doco, Global-Scale Networks, Media Goblin, and TCP/IP Legos
- DEFCON Documentary — free download, I’m looking forward to watching it on the flight back to NZ.
- Global-Scale Systems — botnets as example of the scale of networks and systems we’ll have to build but don’t have experience in.
- MediaGoblin — GNU project to build a decentralized alternative to Flickr, YouTube, SoundCloud, etc.
- Teaching TCP/IP Headers with Legos — genius. (via BoingBoing)
Transit and Peering, Quick Web Interfaces, Open Source Licensing, and RC Roach
- Why YouTube Buffers (ArsTechnica) — When asked if ISPs are degrading Netflix and YouTube traffic to steer users toward their own video services, Crawford told Ars that “the very powerful eyeball networks in the US (and particularly Comcast and Time Warner Cable) have ample incentive and ability to protect the IP services in which they have economic interests. Their real goal, however, is simpler and richer. They have enormous incentives to build a moat around their high-speed data networks and charge for entry because data is a very high-margin (north of 95 percent for the cable companies), addictive, utility product over which they have local monopoly control. They have told Wall Street they will do this. Yes, charging for entry serves the same purposes as discrimination in favor of their own VOD [video-on-demand], but it is a richer and blunter proposition for them.”
- Ink — MIT-licensed interface kit for quick development of web interfaces, simple to use and expand on.
- Licensing in a Post-Copyright World — This article is opening up a bit of the history of Open Source software licensing, how it seems to change and what we could do to improve it. Caught my eye: Oracle that relicensed Berkeley DB from BSD to APGLv3 [… effectively changing] the effective license for 106 other packages to AGPLv3 as well.
- RC Cockroaches (Vine) — video from Dale Dougherty of Backyard Brains Bluetooth RoboRoach. (via Dale Dougherty)
Retreading old topics can be a powerful source of epiphany, sometimes more so than simple extra-box thinking. I was a computer science student, of course I knew statistics. But my recent years as a NoSQL (or better stated: distributed systems) junkie have irreparably colored my worldview, filtering every metaphor with a tinge of information management.
Lounging on a half-world plane ride has its benefits, namely, the opportunity to read. Most of my Delta flight from Tel Aviv back home to Portland lacked both wifi and (in my case) a workable laptop power source. So instead, I devoured Nate Silver’s book, The Signal and the Noise. When Nate reintroduced me to the concept of statistical overfit, and relatedly underfit, I could not help but consider these cases in light of the modern problem of distributed data management, namely, operators (you may call these operators DBAs, but please, not to their faces).
When collecting information, be it for a psychological profile of chimp mating rituals, or plotting datapoints in search of the Higgs Boson, the ultimate goal is to find some sort of usable signal, some trend in the data. Not every point is useful, and in fact, any individual could be downright abnormal. This is why we need several points to spot a trend. The world rarely gives us anything clearer than a jumble of anecdotes. But plotted together, occasionally a pattern emerges. This pattern, if repeatable and useful for prediction, becomes a working theory. This is science, and is generally considered a good method for making decisions.
On the other hand, when lacking experience, we tend to over value the experience of others when we assume they have more. This works in straightforward cases, like learning to cook a burger (watch someone make one, copy their process). This isn’t so useful as similarities diverge. Watching someone make a cake won’t tell you much about the process of crafting a burger. Folks like to call this cargo cult behavior.
How Fit are You, Bro?
You need to extract useful information from experience (which I’ll use the math-y sounding word datapoints). Having a collection of datapoints to choose from is useful, but that’s only one part of the process of decision-making. I’m not speaking of a necessarily formal process here, but in the case of database operators, merely a collection of experience. Reality tends to be fairly biased toward facts (despite the desire of many people for this to not be the case). Given enough experience, especially if that experience is factual, we tend to make better and better decisions more inline with reality. That’s pretty much the essence of prediction. Our mushy human brains are more-or-less good at that, at least, better than other animals. It’s why we have computers and Everybody Loves Raymond, and my cat pees in a box.
Imagine you have a sufficient amount of relevant datapoints that you can plot on a chart. Assuming the axes have any relation to each other, and the data is sound, a trend may emerge, such as a line, or some other bounding shape. A signal is relevant data that corresponds to the rules we discover by best fit. Noise is everything else. It’s somewhat circular sounding logic, and it’s really hard to know what is really a signal. This is why science is hard, and so is choosing a proper database. We’re always checking our assumptions, and one solid counter signal can really be disastrous for a model. We may have been wrong all along, missing only enough data. As Einstein famously said in response to the book 100 Authors Against Einstein: “If I were wrong, then one would have been enough!”
Database operators (and programmers forced to play this role) must make predictions all the time, against a seemingly endless series of questions. How much data can I handle? What kind of latency can I expect? How many servers will I need, and how much work to manage them?
So, like all decision making processes, we refer to experience. The problem is, as our industry demands increasing scale, very few people actually have much experience managing giant scale systems. We tend to draw our assumptions from our limited, or biased smaller scale experience, and extrapolate outward. The theories we then tend to concoct are not the optimal fit that we desire, but instead tend to be overfit.
Overfit is when we have a limited amount of data, and overstate its general implications. If we imagine a plot of likely failure scenarios against a limited number of servers, we may be tempted to believe our biggest odds of failure are insufficient RAM, or disk failure. After all, my network has never given me problems, but I sure have lost a hard drive or two. We take these assumptions, which are only somewhat relevant to the realities of scalable systems and divine some rules for ourselves that entirely miss the point.
In a real distributed system, network issues tend to consume most of our interest. Single-server consistency is a solved problem, and most (worthwhile) distributed databases have some sense of built in redundancy (usually replication, the root of all distributed evil).
Driverless Intersections, Quantum Information, Low-Energy Wireless Networking, and Scammy Game Tactics
- Autonomous Intersection Management Project — a scalable, safe, and efficient multiagent framework for managing autonomous vehicles at intersections. (via How Driverless Cars Could Reshape Cities)
- Quantum Information (New Scientist) — a gentle romp through the possible and the actual for those who are new to the subject.
- Ambient Backscatter (PDF) — a new communication primitive where devices communicate by backscattering ambient RF signals. Our design avoids the expensive process of generating radio waves; backscatter communication is orders of magnitude more power-efﬁcient than traditional radio communication. (via Hacker News)
- Top Free-to-Play Monetization Tricks (Gamasutra) — amazingly evil ways that free games lure you into paying. At this point the user must choose to either spend about $1 or lose their rewards, lose their stamina (which they could get back for another $1), and lose their progress. To the brain this is not just a loss of time. If I spend an hour writing a paper and then something happens and my writing gets erased, this is much more painful to me than the loss of an hour. The same type of achievement loss is in effect here. Note that in this model the player could be defeated multiple times in the boss battle and in getting to the boss battle, thus spending several dollars per dungeon.
Spin up Python-friendly services with 0 lines of code
Twisted is a framework for writing, testing, and deploying event-driven clients and servers in Python. In my previous Twisted blog post, we explored an architectural overview of Twisted and examples of simple TCP, UDP, SSL, and HTTP echo servers.
While Twisted makes it easy to build servers in just a few lines of Python, you can actually use Twisted to spin up servers with 0 lines of code!
We can accomplish this with
twistd (pronounced twist-dee), a command line utility that ships with Twisted for deploying your Twisted applications. In addition to providing a standardized deployment interface for common production features like daemonization, logging, and authentication,
twistd can use Twisted’s plugin architecture to run flexible servers for a variety of protocols. Here are some examples:
twistd web --port 8000 --path .
Run an HTTP server on port 8000, serving both static and dynamic content out of
the current working directory. Visit
http://localhost:8000 to see the directory listing:
Doug Hanks on how the MX series is changing the game
Doug Hanks (@douglashanksjr) is an O’Reilly author (Juniper MX Series) and a data center architect at Juniper Networks. He is currently working on one of Juniper’s most popular devices – the MX Series. The MX is a routing device that’s optimized for delivering high-density and high-speed Layer 2 and Layer 3 Ethernet services. As you watch the video interview embedded in this post, the data is more than likely being transmitted across the Juniper MX.
We recently sat down to discuss the MX Series and the opportunities it presents. Highlights from our conversation include:
- MX is one of Juniper’s best-selling platforms [Discussed at the 0:32 mark].
- Learn if the MX can help you [Discussed at the 1:00 mark].
- What you need to know before using the MX [Discussed at the 6:40 mark].
- What’s next for Juniper [Discussed at the 9:39 mark].
You can view the entire interview in the following video.