- The Mystery Machine (A Paper a Day) — rundown of Facebook’s Mystery Machine, which can measure end-to-end performance from the initiation of a page load in a Web browser, all the way through the server-side infrastructure, and back out to the point where the page has finished rendering. Doing this requires a causal model of the relationships between components (happens-before). How do you get that? And especially, how do you get that if you can’t assume a uniform environment for instrumentation?
- Network Effect — hypnotic and emotional. (via Flowing Data)
- Cultivating Great Distributed Teams (Liza Daly) — updates and refinements on her awesome meeting hack/system.
- Smartphone Energy Consumption (Pete Warden) — I love new ways of looking at familiar things. Looking at code and features through the lens of power consumption is another such lens. (I remember Craig from Craigslist talking at OSCON about using power as the denominator in your data center, changing how I saw the Web). The article is full of surprising numbers and fascinating factoids. Active cell radio might use 800 mW. Bluetooth might use 100 mW. Accelerometer is 21 mW. Gyroscope is 130 mW. Microphone is 101 mW. GPS is 176 mW. Using the camera in ‘viewfinder’ mode, focusing and looking at a picture preview, might use 1,000 mW. Actually recording video might take another 200 to 1,000 mW on top of that.
"distributed systems" entries
Comparing different orchestration tools.
Most software systems evolve over time. New features are added and old ones pruned. Fluctuating user demand means an efficient system must be able to quickly scale resources up and down. Demands for near zero-downtime require automatic fail-over to pre-provisioned back-up systems, normally in a separate data centre or region.
On top of this, organizations often have multiple such systems to run, or need to run occasional tasks such as data-mining that are separate from the main system, but require significant resources or talk to the existing system.
When using multiple resources, it is important to make sure they are efficiently used — not sitting idle — but can still cope with spikes in demand. Balancing cost-effectiveness against the ability to quickly scale is difficult task that can be approached in a variety of ways.
All of this means that the running of a non-trivial system is full of administrative tasks and challenges, the complexity of which should not be underestimated. It quickly becomes impossible to look after machines on an individual level; rather than patching and updating machines one-by-one they must be treated identically. When a machine develops a problem it should be destroyed and replaced, rather than nursed back to health.
Various software tools and solutions exist to help with these challenges. Let’s focus on orchestration tools, which help make all the pieces work together, working with the cluster to start containers on appropriate hosts and connect them together. Along the way, we’ll consider scaling and automatic failover, which are important features.
Find your way through OSCON with these four learning paths.
The open source movement has been with us for almost two decades, and it’s clear that open source is now a de facto choice for software engineers across the globe. The content that you’ll find at OSCON is a reflection of that fact.
The open source world and OSCON itself are vast. With 48 sessions over two days and a bonus day with 11 workshops to choose from, you’ll no doubt have some tough choices to make when you attend the event. Keeping that in mind, I put together four learning paths that encompass the hot topics and important transitions we’re covering at OSCON.
The O'Reilly Radar Podcast: Astrid Atkinson on optimization, and Kelsey Hightower on distributed computing.
Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.
In this week’s episode, O’Reilly’s Mac Slocum talks to Astrid Atkinson, director of software engineering at Google, about the delicate balance of managing complexity in distributed systems and her experience working on-call rotations at Google.
Here are a few snippets from their chat:
I think it’s often really hard for organizations that are scaling quickly to find time to manage complexity in their systems. That can be really a trap, because if you’re really always just focused on the next deadline or whatever, and never planning for what you’re going to live with when you’re done, then you might never find the time.
You can only optimize what you pay attention to, and so if you can’t see what your system is doing, if you can’t see whether it’s working, it’s not working.
I used to get paged awake at two in the morning. You go from zero to Google is down. That’s a lot to wake up to.