"aws" entries

How to create a Swarm cluster with Docker

Using Docker Machine to create a Swarm cluster across cloud providers.

Editor’s note: this is an Early Release excerpt from Chapter 7 of Docker Cookbook by Sébastien Goasguen. The recipes in this book will help developers go from zero knowledge to distributed applications packaged and deployed within a couple of chapters. One of the key value propositions of Docker is app portability. The following will show you how to use Docker Machine to create a Swarm cluster across cloud providers.


You understand how to create a Swarm cluster manually (see Recipe 7.3), but you would like to create one with nodes in multiple public Cloud Providers and keep the UX experience of the local Docker CLI.


Use Docker Machine to start Docker hosts in several Cloud providers and bootstrap them automatically to create a swarm cluster.


This is an experimental feature in Docker Machine and is subject to change.

The first thing to do is to obtain a swarm discovery token. This will be used during the bootstrapping process when starting the nodes of the cluster. As explained in Recipe 7.3, swarm features multiple discovery process. In this recipe, we used the service hosted by Docker, Inc. A discovery token is obtained by running a container based on the swarm image and running the create command. Assuming we do not have access to a Docker host already, we use docker-machine to create one solely for this purpose.

With the token in hand, we can use docker-machine and multiple public Cloud drivers to start worker nodes. We can start a swarm head node on VirtualBox, a worker on DigitalOcean and another one on Azure.


Do not start a swarm head in a public cloud and a worker on your localhost with VirtualBox. Chances are the head will not be able to route network traffic to your local worker node. It is possible to do, but you would have to open ports on your local router.

Your swarm cluster is now ready. Your swarm head node is running locally in a Virtualbox VM, one worker node is running in DigitalOcean and another one in Azure. You can set the local docker-machine binary to use the head node running in VirtualBox and start using the swarm subcommands:


If you start a container, swarm will schedule it in round-robin fashion on the cluster. For example, starting three nginx container in a for loop with:

Will lead to three nginx container on the three nodes in your cluster. Remember that you will need to open port 80 on the instances running in the Cloud to access the container.


Do not forget to remove the machine you started in the Cloud.

See Also

  • Using Docker machine with Docker swarm.

Editor’s note: if you’re interested in learning more about networking at scale, you’ll want to check out Jay Edwards’ Distributed Systems training session at Velocity in Santa Clara May 27-29, 2015.


What containers can do for you

Docker, Rocket, and big industry changes are making it a great time to seriously consider using containers.

Container Image: CC BY-SA 2.0 Photocapy https://www.flickr.com/photos/photocapy/252737232/in/photostream/

If you read any IT news these days it’s hard to miss a headline about “the container revolution.” Docker’s year-and-a-half-old engine had a monopoly on the buzz until CoreOS launched its own project, Rocket, in December.

The technology behind containers can seem esoteric, but the advantages of bringing containers to your organization are more compelling than ever. And containers’ inherent portability opens up exciting new opportunities for how organizations host their applications.

Containerization is having its moment and there’s never been a better time to check it out for yourself.

Read more…

Comments: 5

Ins and Outs of Running MySQL on AWS

Laine Campbell on why AWS is a good platform option for running MySQL at scale

In the following interview, PalominoDB owner and CEO Laine Campbell discusses advantages and disadvantages of using Amazon Web Services (AWS) as a platform for running MySQL. The solution provides a functional environment for young startups who can’t afford a database administrator (DBA), Campbell says, but there are drawbacks to be aware of, such as a lack of access to your database’s file system, and troubleshooting “can get quite hairy.” This interview is a sneak preview to Campbell’s upcoming Velocity session, “Using Amazon Web Services for MySQL at Scale.”

Why is AWS a good platform for scaling MySQL?

Laine Campbell

Laine Campbell

Laine Campbell: The elasticity of Amazon’s cloud service is key to scaling on most tiers in an application’s infrastructure, and this is true with MySQL as well. Concurrency is a recurring pattern with MySQL’s scaling capabilities, and as traffic and concurrent queries grow, one has to introduce some fairly traditional scaling patterns. One such pattern is adding replicas to distribute read I/O and reduce contention and concurrency, which is easy to do with rapid deployment of new instances and Elastic Block Storage (EBS) snapshots.

Additionally, sharding can be done with less impact via EBS snapshots being used to recreate the dataset, and then data that is not part of the new shard is removed. Amazon’s relational database service for MySQL—RDS—is also a new, rather compelling scaling pattern for the early stages of a company’s life, when resources are scarce and administrators have not been hired. RDS is a great pattern for people to emulate in terms of rapid deployment of replicas, ease of master failovers, and the ability to easily redeploy hosts when errors occur, rather than spending extensive time trying to repair or clean up data.

Read more…


What Is the Risk That Amazon Will Go Down (Again)?

Velocity 2013 Speaker Series

Why should we at all bother about notions such as risk and safety in web operations? Do web operations face risk? Do web operations manage risk? Do web operations produce risk? Last Christmas Eve, Amazon had an AWS outage affecting a variety of actors, including Netflix, which was a service included in many of the gifts shared on that very day. The event has introduced the notion of risk into the discourse of web operations, and it might then be good timing for some reflective thoughts on the very nature of risk in this domain.

What is risk? The question is a classic one, and the answer is tightly coupled to how one views the nature of the incident occurring as a result of the risk.

One approach to assessing the risk of Amazon going down is probabilistic: start by laying out the entire space of potential scenarios leading to Amazon going down, calculate their probability, and multiply the probability for each scenario by their estimated severity (likely in terms of the costs connected to the specific scenario depending on the time of the event). Each scenario can then be plotted in a risk matrix showing their weighted ranking (to prioritize future risk mitigation measures) or calculated as a collective sum of the risks for each scenario (to judge whether the risk for Amazon going down is below a certain acceptance criterion).

This first way of answering the question of what the risk is for Amazon to go down is intimately linked with a perception of risk as energy to be kept contained (Haddon, 1980). This view originates from more recent times of increased development of process industries in which clearly graspable energies (fuel rods at nuclear plants, the fossil fuels at refineries, the kinetic energy of an aircraft) are to be kept contained and safely separated from a vulnerable target such as human beings. The next question of importance here becomes how to avoid an uncontrolled release of the contained energy. The strategies for mitigating the risk of an uncontrolled release of energy are basically two: barriers and redundancy (and the two combined: redundancy of barriers). Physically graspable energies can be contained through the use of multiple barriers (called “defenses in depth”) and potentially several barriers of the same kind (redundancy), for instance several emergency-cooling systems for a nuclear plant.

Using this metaphor, the risk of Amazon going down is mitigated by building a system of redundant barriers (several server centers, backup, active fire extinguishing, etc.). This might seem like a tidy solution, but here we run into two problems with this probabilistic approach to risk: the view of the human operating the system and the increased complexity that comes as a result of introducing more and more barriers.

Controlling risk by analyzing the complete space of possible (and graspable) scenarios basically does not distinguish between safety and reliability. From this view, a system is safe when it is reliable, and the reliability of each barrier can be calculated. However there is one system component that is more difficult to grasp in terms of reliability than any other: the human. Inevitably, proponents of the energy/barrier model of risk end up explaining incidents (typically accidents) in terms of unreliable human beings not guaranteeing the safety (reliability) of the inherently safe (risk controlled by reliable barriers) system. I think this problem—which has its own entire literature connected to it—is too big to outline in further detail in this blog post, but let me point you towards a few references: Dekker, 2005; Dekker, 2006; Woods, Dekker, Cook, Johannesen & Sarter, 2009. The only issue is these (and most other citations in this post) are all academic tomes, so for those who would prefer a shorter summary available online, I can refer you to this report. I can also reassure you that I will get back to this issue in my keynote speech at the Velocity conference next month. To put the critique short: the contemporary literature questions the view of humans as the unreliable component of inherently safe systems, and instead advocates a view of humans as the only ones guaranteeing safety in inherently complex and risky environments.
Read more…

Comment: 1

Data sharing drives diagnoses and cures, if we can get there (part 1)

Observations from Sage Congress and collaboration through its challenge

The glowing reports we read of biotech advances almost cause one’s brain to ache. They leave us thinking that medical researchers must command the latest in all technological tools. But the engines of genetic and pharmaceutical innovation are stuttering for lack of one key fuel: data. Here they are left with the equivalent of trying to build skyscrapers with lathes and screwdrivers.

Sage Congress, held this past week in San Francisco, investigated the multiple facets of data in these field: gene sequences, models for finding pathways, patient behavior and symptoms (known as phenotypic data), and code to process all these inputs. A survey of efforts by the organizers, Sage Bionetworks, and other innovations in genetic data handling can show how genetics resembles and differs from other disciplines.

An intense lesson in code sharing

At last year’s Congress, Sage announced a challenge, together with the DREAM project, intended to galvanize researchers in genetics while showing off the growing capabilities of Sage’s Synapse platform. Synapse ties together a number of data sets in genetics and provides tools for researchers to upload new data, while searching other researchers’ data sets. Its challenge highlighted the industry’s need for better data sharing, and some ways to get there.

Read more…


Amazon improves EC2 (by embracing failure)

Amazon just announced two big improvements to EC2: Multiple LocationsAmazon EC2 now provides the ability to place instances in multiple locations. Amazon EC2 locations are composed of regions and Availability Zones. Regions are geographically dispersed and will be in separate geographic areas or countries. Currently, Amazon EC2 exposes only a single region. Availability Zones are distinct locations that are engineered…

Comments: 5