"distributed systems" entries

Swarm v. Fleet v. Kubernetes v. Mesos

Comparing different orchestration tools.

Buy Using Docker Early Release.

Buy Using Docker Early Release.

Most software systems evolve over time. New features are added and old ones pruned. Fluctuating user demand means an efficient system must be able to quickly scale resources up and down. Demands for near zero-downtime require automatic fail-over to pre-provisioned back-up systems, normally in a separate data centre or region.

On top of this, organizations often have multiple such systems to run, or need to run occasional tasks such as data-mining that are separate from the main system, but require significant resources or talk to the existing system.

When using multiple resources, it is important to make sure they are efficiently used — not sitting idle — but can still cope with spikes in demand. Balancing cost-effectiveness against the ability to quickly scale is difficult task that can be approached in a variety of ways.

All of this means that the running of a non-trivial system is full of administrative tasks and challenges, the complexity of which should not be underestimated. It quickly becomes impossible to look after machines on an individual level; rather than patching and updating machines one-by-one they must be treated identically. When a machine develops a problem it should be destroyed and replaced, rather than nursed back to health.

Various software tools and solutions exist to help with these challenges. Let’s focus on orchestration tools, which help make all the pieces work together, working with the cluster to start containers on appropriate hosts and connect them together. Along the way, we’ll consider scaling and automatic failover, which are important features.

Read more…

Four short links: 8 October 2015

Four short links: 8 October 2015

Mystery Machine, Emotional Effect, Meeting Hacks, and Energy Consumption

  1. The Mystery Machine (A Paper a Day) — rundown of Facebook’s Mystery Machine, which can measure end-to-end performance from the initiation of a page load in a Web browser, all the way through the server-side infrastructure, and back out to the point where the page has finished rendering. Doing this requires a causal model of the relationships between components (happens-before). How do you get that? And especially, how do you get that if you can’t assume a uniform environment for instrumentation?
  2. Network Effect — hypnotic and emotional. (via Flowing Data)
  3. Cultivating Great Distributed Teams (Liza Daly) — updates and refinements on her awesome meeting hack/system.
  4. Smartphone Energy Consumption (Pete Warden) — I love new ways of looking at familiar things. Looking at code and features through the lens of power consumption is another such lens. (I remember Craig from Craigslist talking at OSCON about using power as the denominator in your data center, changing how I saw the Web). The article is full of surprising numbers and fascinating factoids. Active cell radio might use 800 mW. Bluetooth might use 100 mW. Accelerometer is 21 mW. Gyroscope is 130 mW. Microphone is 101 mW. GPS is 176 mW. Using the camera in ‘viewfinder’ mode, focusing and looking at a picture preview, might use 1,000 mW. Actually recording video might take another 200 to 1,000 mW on top of that.

How to experience OSCON Amsterdam 2015

Find your way through OSCON with these four learning paths.

Paths by Francesca Gallo on Flickr. Used under a Creative Commons License.

The open source movement has been with us for almost two decades, and it’s clear that open source is now a de facto choice for software engineers across the globe. The content that you’ll find at OSCON is a reflection of that fact.

The open source world and OSCON itself are vast. With 48 sessions over two days and a bonus day with 11 workshops to choose from, you’ll no doubt have some tough choices to make when you attend the event. Keeping that in mind, I put together four learning paths that encompass the hot topics and important transitions we’re covering at OSCON.

I’m looking forward to seeing you at OSCON in Amsterdam in October! Read more…

Managing complexity in distributed systems

The O'Reilly Radar Podcast: Astrid Atkinson on optimization, and Kelsey Hightower on distributed computing.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.


In this week’s episode, O’Reilly’s Mac Slocum talks to Astrid Atkinson, director of software engineering at Google, about the delicate balance of managing complexity in distributed systems and her experience working on-call rotations at Google.

Here are a few snippets from their chat:

I think it’s often really hard for organizations that are scaling quickly to find time to manage complexity in their systems. That can be really a trap, because if you’re really always just focused on the next deadline or whatever, and never planning for what you’re going to live with when you’re done, then you might never find the time.

You can only optimize what you pay attention to, and so if you can’t see what your system is doing, if you can’t see whether it’s working, it’s not working.

I used to get paged awake at two in the morning. You go from zero to Google is down. That’s a lot to wake up to.

Read more…

Four short links: 2 September 2015

Four short links: 2 September 2015

Hard Problems in Distributed Systems, Engineering Bootcamp, Scripted TV, and C Guidelines

  1. There Are Only Two Hard Problems in Distributed Systems — the best tweet ever. (via Tim Bray)
  2. Building LinkedIn’s New Engineering Bootcamp — transmitting cultural and practical knowledge in a structured format.
  3. Soul-Searching in TV Land Over the Challenges of a New Golden Age (NY Times) — The number of scripted shows produced by networks, cable networks and online services ballooned to 371 last year, according to statistics compiled by FX. Mr. Landgraf believes that figure will pass 400 this year, which would nearly double the 211 shows made in 2009. […] predicted that the number of shows would slowly return to about 325 over the next few years, in large part because scripted television is expensive.
  4. C Programming Substance GuidelinesThis document is mainly about avoiding problems specific to the C programming language.
Four short links: 21 August 2015

Four short links: 21 August 2015

Web Experiments, Virtual Time, Reading Postmortem, and Chinese Robot Companies

  1. Doing Science on the Web (Alex Russell) — Minimizing harm to the ecosystem from experiments-gone-wrong […] This illustrates what happens when experiments inadvertently become critical infrastructure. It has happened before. Over, and over, and over again. Imma need therapy for the flashbacks. THE HORROR.
  2. Virtual Time (Adrian Colyer) — applying special relativity to distributed systems. Contains lines like: All messages sent explicitly by user programs have a positive (+) sign; their antimessages have a negative (-) sign. Whenever a process sends a message, what actually happens is that a faithful copy of the message is transmitted to the receiver’s input queue, and a negative copy, the antimessage, is retained in the sender’s output queue for use in case the sender rolls back. Curl up with your intoxicant of choice and prepare to see the colour of infinity.
  3. Lessons Learned from Reading Postmortems — (of the software kind) Except in extreme emergencies, risky code changes are basically never simultaneously pushed out to all machines because of the risk of taking down a service company-wide. But it seems that every company has to learn the hard way that seemingly benign config changes can also cause a company-wide service outage.
  4. 194 Chinese Robot Companies (Robohub) — Overall, 107 Chinese companies are involved in industrial robotics. Many of these new industrial robot makers are producing products that, because of quality, safety, and design regulations, will only be acceptable to the Chinese market. Many interesting numbers about the Chinese robotics biz.
Four short links: 29 July 2015

Four short links: 29 July 2015

Mobile Medical Scanner, Amazon Hardware Showcase, Consistency Challenges, and Govt Alpha Geeks

  1. Cellphone-Based Hand-Held Microplate Reader for Point-of-Care Testing of Enzyme-Linked Immunosorbent Assayswe created a hand-held and cost-effective cellphone-based colorimetric microplate reader that implements a routine hospital test used to identify HIV and other conditions. (via RtoZ)
  2. Amazon Launchpad — a showcase for new hardware startups, who might well be worried about Amazon’s “watch what sells and sell a generic version of it” business model.
  3. Challenges to Adopting Stronger Consistency at Scale (PDF) — It is not obvious that a system that trades stronger consistency for increased latency or reduced availability would be a net benefit to people using Facebook, especially when compared against a weakly consistent system that resolves many inconsistencies with ad hoc mechanisms.
  4. The White House’s Alpha Geeks — Megan Smith for President. I realize now there’s two things we techies should do — one is go where there are lots of us, like MIT or Silicon Valley or whatever, because you can move really fast and do extraordinary things. The other is, go where you’re rare.It’s almost like you’re a frog in boiling water; you don’t really realize how un-diverse it is until you’re in a normal diverse American innovative community like the President’s team. And then you go back and you’re like, wow. You feel, “Man, this industry is so awesome and yet we’re missing all of this talent.”
Four short links: 21 July 2015

Four short links: 21 July 2015

Web Future, GCE vs Amazon, Scammy eBooks, and Container Clusters

  1. Web Design: The First 100 Years (Maciej Ceglowski) — There’s a William Gibson quote that Tim O’Reilly likes to repeat: “the future is here; it’s just not evenly distributed yet.” O’Reilly takes this to mean that if we surround ourselves with the right people, it can give us a sneak peek at coming attractions. I like to interpret this quote differently, as a call to action. Rather than waiting passively for technology to change the world, let’s see how much we can do with what we already have. Let’s reclaim the Web from technologists who tell us that the future they’ve imagined is inevitable, and that our role in it is as consumers.
  2. Comparing Cassandra Write Performance on Google Compute Engine and AWStl;dr – We achieved better Cassandra performance on GCE vs. Amazon, at close to half the cost. Also interesting for how they built the benchmark.
  3. The Scammy Underground World of Kindle eBooksThe biggest issue here isn’t that scammers are raking in cash from low-quality content; it’s that Amazon is allowing this to happen. Publisher brand value is the reliable expectation that buyers have of the book quality. Amazon’s publishing arm is spending the good brand value built by its distribution arm.
  4. Empire a 12-factor-compatible, Docker-based container cluster built on top of Amazon’s robust EC2 Container Service (ECS), complete with a full-featured command line interface. Open source.
Four short links: 17 July 2015

Four short links: 17 July 2015

Smalltalky Web, Arduino Speech, Testing Distributed Systems, and Dataflow for FP

  1. Project Journal: Objects (Ian Bicking) — a view askew at the Web, inspired by Alan Kay’s History of Smalltalk.
  2. Speech Recognition for Arduino (Kickstarter) — for all your creepy toy hacking needs!
  3. Conductor (github) — a framework for testing distributed systems.
  4. Dataflow Syntax for Functional Programming? — two great tastes that will make your head hurt together!
Four short links: 6 July 2015

Four short links: 6 July 2015

DeepDream, In-Flight WiFi, Computer Vision in Preservation, and Testing Distributed Systems

  1. DeepDream — the software that’s been giving the Internet acid-free trips.
  2. In-Flight WiFi Business — numbers and context for why some airlines (JetBlue) have fast free in-flight wifi while others (Delta) have pricey slow in-flight wifi. Four years ago ViaSat-1 went into geostationary orbit, putting all other broadband satellites to shame with 140 Gbps of total capacity. This is the Ka-band satellite that JetBlue’s fleet connects to, and while the airline has to share that bandwidth with homes across of North America that subscribe to ViaSat’s Excede residential broadband service, it faces no shortage of capacity. That’s why JetBlue is able to deliver 10-15 Mbps speeds to its passengers.
  3. British Library Digitising Newspapers (The Guardian) — as well as photogrammetry methods used in the Great Parchment Book project, Terras and colleagues are exploring the potential of a host of techniques, including multispectral imaging (MSI). Inks, pencil marks, and paper all reflect, absorb, or emit particular wavelengths of light, ranging from the infrared end of the electromagnetic spectrum, through the visible region and into the UV. By taking photographs using different light sources and filters, it is possible to generate a suite of images. “We get back this stack of about 40 images of the [document] and then we can use image-processing to try to see what is in [some of them] and not others,” Terras explains.
  4. Testing a Distributed System (ACM) — This article discusses general strategies for testing distributed systems as well as specific strategies for testing distributed data storage systems.