"ops" entries

Boost your career with new levels of automation

Elevate automation through orchestration.


As sysadmins we have been responsible for running applications for decades. We have done everything to meet demanding SLAs including “automating all the things” and even trading sleep cycles to recuse applications from production fires. While we have earned many battle scars and can step back and admire fully automated deployment pipelines, it feels like there has always been something missing. Our infrastructure still feels like an accident waiting to happen and somehow, no matter how much we manage to automate, the expense of infrastructure continues to increase.

The root of this feeling comes from the fact that many of our tools don’t provide the proper insight into what’s really going on and require us to reverse engineer applications in order to effectively monitor them and recover from failures. Today many people bolt on monitoring solutions that attempt to probe applications from the outside and report “health” status to a centralized monitoring system, which seems to be riddled with false alarms or a list of alarms that are not worth looking into because there is no clear path to resolution.

What makes this worse is how we typically handle common failure scenarios such as node failures. Today many of us are forced to statically assign applications to machines and manage resource allocations on a spreadsheet. It’s very common to assign a single application to a VM to avoid dependency conflicts and ensure proper resource allocations. Many of the tools in our tool belt have be optimized for this pattern and the results are less than optimal. Sure this is better than doing it manually, but current methods are resulting in low resource utilization, which means our EC2 bills continue to increase — because the more you automate, the more things people want to do.

How do we reverse course on this situation? Read more…

Comment: 1

The case for continuous delivery

Building functionality that really delivers the expected customer value

By now, many of us are aware of the wide adoption of continuous delivery within companies that treat software development as a strategic capability that provides competitive advantage. Amazon is on record as making changes to production every 11.6 seconds on average in May of 2011. Facebook releases to production twice a day. Many Google services see releases multiple times a week, and almost everything in Google is developed on mainline. Still, many managers and executives remain unconvinced as to the benefits, and would like to know more about the economic drivers behind CD.

First, let’s define continuous delivery. Martin Fowler provides a comprehensive definition on his website, but here’s my one sentence version: Continuous delivery is a set of principles and practices to reduce the cost, time, and risk of delivering incremental changes to users.

Read more…


Why feedback?

Maintaining a desired behavior

In two previous posts (Part 1 and Part 2) we introduced the idea of feedback control. The basic idea is that we can keep a system (any system!) on track, by constantly monitoring its actual behavior, so that we can apply corrective actions to the system’s input, to “nudge” it back on target, if it ever begins to go astray.

This begs the question: Why should we, as programmers, software engineers, and system administrator care? What’s in it for us?

Read more…

Four short links: 2 July 2013

Four short links: 2 July 2013

Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management

  1. How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
  2. How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
  3. Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
  4. mesosa cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)

Happy SysAdmin Appreciation Day!

Did you do anything on the web today? Thank a SysAdmin.

If you are reading this (or any other) page, sending a message, watching a video, reading an email, or doing anything else that touches the web, you can thank a SysAdmin.

Comments: 3

Velocity 2011 retrospective

Resilience engineering and data's role in performance are key trends in web ops.

A number of emerging themes are defining the web operations world, including: resilience engineering, new approaches to failure, and the role data plays in boosting performance.

Comments: 2

Velocity 2011 debrief

Steve Souders weighs in on Velocity 2011 and looks ahead to upcoming Velocity events.

This was Velocity's fourth year, and while every year has seen significant growth, the 2011 conference felt like a tremendous step forward in all areas.

Comments Off on Velocity 2011 debrief

Velocity 2011

A tribe of web performance and operations pros is pushing the web forward.

As we approach the fourth Velocity conference, here's a look at how the web performance and operations communities came together, what they've done to improve the web experience, and the work that lies ahead.

Comment: 1

The state of speed and the quirks of mobile optimization

Steve Souders on browser wars, site speed, and the HTTP Archive.

In this interview, Google performance evangelist and Velocity co-chair Steve Souders discusses browser competition, the differences between mobile and desktop optimization, and his hopes for the HTTP Archive.

Comment: 1
Four short links: 20 May 2011

Four short links: 20 May 2011

Digital Forex, Blasts from the Past, Mobile Web Performance, Skype at Conferences

  1. BitCoin Watch — news and market analysis for this artificial currency. (If you’re outside the BitCoin world wondering wtf the fuss is all about try We Use Coins for a gentle primer and then Is BitCoin a Good Idea? for the case against) (via Andy Baio)
  2. Time Capsule — send your Flickr photos from a year ago. I love that technology helps us connect not just with other people right now, but with ourselves in the future. Compare TwitShift and Foursquare and Seven Years Ago. (via Really Interesting Group)
  3. HTTP Archive Mobile — mobile performance data. The top 100 web pages average out at 271kb vs 401kb for their desktop incarnations, which still seems unjustifiably high to me.
  4. Skype at ConferencesThe two editors of the book were due to lead the session but were at the wrong ends of a skype three way video conference which stuttered into a dalekian half life without really quite making the breakthrough into comprehensibility. After various attempts to rewire, reconfigure and reboot, we gave up and had what turned into a good conversation among the dozen people round the table in London. Conference organizers, take note: Skype at conferences is a recipe for fail.
Comments Off on Four short links: 20 May 2011