Operations
More on how web performance impacts revenue...
by Jesse Robbins | @jesserobbins | comments: 9
At Velocity this year Microsoft, Google and Shopzilla each presented data on how web performance directly impacts revenue.
Their data showed that slow sites get fewer search queries per user, less revenue per visitor, fewer clicks, fewer searches, and lower search engine rankings. They found that in some cases even after site performance was improved users continued to interact as if it was slow. Bad experiences have a lasting influence on customer behavior.
What about smaller websites that aren't yet at this scale?
Alistair Croll and Sean Power, the authors of the new book Complete Web Monitoring, have continued this research for sites at smaller scale.
They used a Strangeloop Networks web acceleration appliance to optimize half the sessions to a smaller production website, tagging optimized and unoptimized visitors so they could be analyzed in Google Analytics. The Strangeloop device applies many of Steve Souders' performance rules to an existing site automatically (a kind of "Steve-in-a-Box" ;-).
The results of their analysis show how significant a reduction in page latency can be. In addition to reducing bounce rates, and increasing pages per visit & time on site, they found a 16.07% increase in conversion rates and a 5.50% increase in average order value.
Check out the full post on the Watching Websites blog.
tags: alistair croll, book related, operations, performance, velocity, velocityconf, watching websites, web monitoring
| comments: 9
submit:
Four short links: 4 September 2009
Flood Maps, Govt Permalinks, Ops, and Security
by Nat Torkington | @gnat | comments: 1
- Flood Maps -- what the world will look like when the oceans rise. Interactive, so you can dial up your preferred level of environmental horror. (via Hans Nowak)
- Citability -- making government accessible, reliable, and transparent with advanced permalinks, as Government websites are ever changing and cannot be cited. Content changes without notice or accountability.
- Bootstrapping EC2 Images as Puppet Clients -- This is a post on how to get to the point of using Puppet in an EC2 environment, by automatically configuring EC2 instances as Puppet clients once they're launched. I've been learning that if you're using a cloud hosting service, you need an automated admin tool. (via Grig Gheorghiu). See also the APT repository for Chef.
- USB Snoop Stick -- Trojan in a convenient form factor, malware on a stick, back doors in your pocket ... and best of all, it's sold to consumers.
tags: climate change, environment, gov 2.0, operations, security, web, web monitoring
| comments: 1
submit:
Is intimate personal information a toxic asset in cloud datacenters?
by Carl Hewitt | comments: 15
Guest blogger Carl Hewitt, Emeritus at MIT in the Electrical Engineering and Computer Science department, is known for his research on strongly paraconsistent logic, privacy-friendly client cloud computing, norms and commitments for organizational computing, and concurrent programming languages, models, and theories.
Aggregators (Google, Yahoo, Microsoft, Facebook, etc.) tend to believe that personal information is a valuable asset for several reasons. It is valuable to advertisers because it enables greater relevance for their ads. It is valuable to users because it can be used to enrich their lives. And it is valuable to aggregators because they can use personal information to make more money by selling (anonymous?) versions and by using it to bring together advertisers and customers. Recency and intimacy can add value to information. Current and recent information tends to be more relevant than older information. Intimate psychological, physiological, sociological, geographical, medical, etc. information can be used to personalize interactions.
Intimate current personal information is also valuable for government security because it can be critical to taking security counter measures. Already in the UK, the previous two years of everyone's email, web browsing, and telephone calls are becoming available to government officials at varying levels of detail. For example, detectives will be required to consider accessing telephone and internet records during every investigation under new plans to increase police use of communications data.
But that's only the beginning. As Jim Gray noted in "Distributed Computing Economics" (MSR-TR-2003-24) there is a growing imbalance between the computation power of billions of cores in aggregator datacenters and the relatively feeble fiber optic communications coming out of aggregator datacenters. This problem has now become so severe that Amazon has been forced to introduce a commercial service that lets users of their cloud import and export data through the post--as in, put it on storage devices and ship it by land, sea, or air. Soon even this stopgap will become impractical for government security agencies because whole shipping containers would have to be transferred--the functional equivalent of shipping large pieces of an aggregator datacenter. Consequently, to be effective, future government security software will have to be tightly integrated with aggregator datacenters. The most effective security measures will require aggregator datacenters to be heavily regulated, i.e., analogous to nuclear power plants.
Semantic Integration, an emerging technological capability to bring together all kinds of information in a semantic engine, will greatly intensify all of the above issues (see "A historical perspective on developing foundations for privacy-friendly client cloud computing: The Paradigm Shift from 'Inconsistency Denial' to 'Practical Semantic IntegrationTM' " ArXiv 0901.4934). The following kinds of information can be semantically integrated: calendars and to-do lists, email, SMS and Twitter archives, presence information (including physical, psychological and social), maps (including firms, points of interest, traffic, parking, and weather), events (including alerts and status), documents (including presentations, spreadsheets, proposals, job applications, health records, photos, videos, gift lists, memos, purchasing, contracts, articles), contacts (including social graphs and reputation) and search results (including rankings and ratings).
Two critical technologies are the foundation of Practical Semantic Integration: The first is Lightly Structured Natural LanguageTM interfaces that allow information to be easily found and organized. The second is many-core semantic engines (see "ActorScriptTM: Industrial strength integration of local and nonlocal concurrency for Client-cloud Computing""; ArXiv 0907.3330) that rapidly process information in ways that are tolerant of inconsistency (see "Common sense for concurrency and strong paraconsistency using unstratified inference and reflection" ArXiv 0812.4852).
To be effective, government security Semantic Integration systems will need to be joined with those of aggregators. Thus Semantic Integration of personal information on aggregator datacenters will require additional government regulation of aggregators. Will government regulation prove toxic to the ability of aggregators to innovate?
This is a future that we expect most readers would find distasteful. There is an alternative: A client cloud is a local cloud controlled by a client, e.g., a family cloud might consist of the cell phones, computers, security cameras, home entertainment centers, Wi-Fi access points, etc. of a family. Semantic Integration could be performed in clients' clouds so that clients by default store their information in cloud datacenters in a way that it can be decrypted only by using a client';s secret key.
Semantic Integration using clients' clouds has some important advantages. Client responsiveness can be faster by not requiring communication with datacenters. Aggregator capital, operating and communication costs can be lower because Semantic Integration is performed in clients' clouds instead of aggregator datacenters.
By performing Semantic Integration in clients' clouds, aggregators can make tons of more money than now by doing an even better job of matching up customers with merchants in a way that is more pleasing to both. Aggregators can provide software that runs in the clients' clouds (although it may have to be audited by 3rd parties). The aggregator's software can volunteer high level information to the aggregator's datacenters about the kind of merchant information that might be relevant. Within clients' clouds, the merchant information can then be tailored to the specific requirements of clients.
For reasons above, an aggregator can do better by performing clients’ Semantic Integration using their clouds rather than relying entirely on the aggregator's cloud. And using clients’ clouds could lessen the degree of government regulation because the government would have to subpoena clients to obtain their most intimate personal information. If the information in an aggregator’s datacenters is sufficiently anonymous, then it would not become necessary for government security agencies to regulate them so heavily.
The question is: "What are the aggregators going to do about intimate personal information?" If one of them initiates a project to develop a Semantic Integration product that operates in clients' clouds, then the others will rapidly follow suit.
tags: emerging tech, operations
| comments: 15
submit:
John Adams on Fixing Twitter: Improving the Performance and Scalability of the World's Most Popular Micro-blogging Site
by Jesse Robbins | @jesserobbins | comments: 2
Twitter is suffering outages today as they fend off a Denial of Service attack, and so I thought it would be helpful to post John Adams’ exceptional Velocity session about Operations at Twitter.
Good luck today John & team… I know it’s going to be a long day!
Update: Apparently Facebook & Livejournal have had similar attacks today. Rich Miller from Data Center Knowledge reminds us that this is just the latest in a series of major attacks.
tags: attacks, critical infrastructure, infrastructure, operations, performance, security, twitter, velocity, velocity09, velocityconf, video, web2.0, webops
| comments: 2
submit:
Velocity and the Bottom Line
by Steve Souders | comments: 3
Velocity 2009 took place last week in San Jose, with Jesse Robbins
and I serving as co-chairs. Back in
November 2008, while we were planning Velocity, I said I wanted to highlight "best practices in performance and operations that improve the user experience as well as the company's bottom line." Much of my work focuses on the how of improving performance - tips developers use to create even faster web sites. What's been missing is the why. Why is it important for companies to focus on performance?
That question was answered at Velocity last week by speakers from AOL, Google, Microsoft, and Shopzilla.
- Eric Schurman (Bing) and Jake Brutlag (Google Search) co-presented results from latency experiments conducted independently on each site. Bing found that a 2 second slowdown changed queries/user by -1.8% and revenue/user by -4.3%. Google Search found that a 400 millisecond delay resulted in a -0.59% change in searches/user. What's more, even after the delay was removed, these users still had -0.21% fewer searches, indicating that a slower user experience affects long term behavior. (video, slides)
- Dave Artz from AOL presented several performance suggestions. He concluded with statistics that show page views drop off as page load times increase. Users in the top decile of page load times view ~7.5 pages/visit. This drops to ~6 pages/visit in the 3rd decile, and bottoms out at ~5 pages/visit for users with the slowest page load times. (slides)
- Marissa Mayer shared several performance case studies from Google. One experiment increased the number of search results per page from 10 to 30, with a corresponding increase in page load times from 400 milliseconds to 900 milliseconds. This resulted in a 25% dropoff in first result page searches. Adding the checkout icon (a shopping cart) to search results made the page 2% slower with a corresponding 2% drop in searches/user. (Watch the video to see the clever workaround they found.) Image optimizations in Google Maps made the page 2-3x faster, with significant increase in user interaction with the site. (video, slides)
- Phil Dixon, from Shopzilla, had the most takeaway statistics about the impact of performance on the bottom line. A year-long performance redesign resulted in a 5 second speed up (from ~7 seconds to ~2 seconds). This resulted in a 25% increase in page views, a 7-12% increase in revenue, and a 50% reduction in hardware. This last point shows the win-win of performance improvements, increasing revenue while driving down operating costs. (video, slides)
These case studies provide real world numbers that show the benefits of making your site faster. Other Velocity sessions share techniques for implementing performance improvements, including sessions from me, Doug Crockford, and the Facebook and Google frontend teams. But what about the user experience? In his session, Matt Mullenweg (of WordPress fame) makes sure we remember the importance of how the user feels while interacting with our site:
That's why [performance] is important and why we should be obsessed and not be discouraged when it doesn't change the funnel. My theory here is when an interface is faster, you feel good. And ultimately what that comes down to is you feel in control. The web app isn't controlling me, I'm controlling it. Ultimately that feeling of control translates to happiness in everyone. In order to increase the happiness in the world, we all have to keep working on this.
Thanks to the Velocity speakers & their organizations for overcoming the many challenges required to present this data for the first time. We're now equipped with the financial justification, the technical know-how, and the visceral motivation to go out and make the Web a faster place. We'll have more performance success stories next year. Your company could be one of them! Capture your performance improvements and bottom line impact. We'd love to hear from you at Velocity 2010.
tags: operations, velocity09, velocityconf, web2.0, webops
| comments: 3
submit:
Four short links: 29 June 2009
Syadmin Wiki, Physics, National Archives, and Reinventing the British Government
by Nat Torkington | @gnat | comments: 1
- Server Fault -- Wikipedia-like sysadmin guide, built by the Stack Overflow team, who are branching out to reach a more general IT Professional audience. (via Brady in email)
- Sixty Symbols -- 5m videos about the symbols of physics and astronomy. Great stuff! (via Glutnix on Twitter)
- US National Archives launches YouTube Channel -- a mixture of archives-nerd stuff (directors of Presidential Libraries talking about their favourite items) and wider-interest collections (such as Touring 1930s America).
- Open House in Westminster -- the ever-insightful Tom Steinberg from MySociety has an article in the Independent about British plans to reinvent government. Now the talk of Westminster is all about democratic reform. By my count there are over 50 different ideas for changing the way our democracy works being touted by different pundits at the moment. [...] What all these ideas, though, have in common is that they propose structural reforms that could have been achieved any time in the last 200 years.[...] My view is that these proposals are all interesting, and some may be quite critical for a better democracy. But I am also concerned that they do not see Parliament and the process of making laws as a native to the internet would. They don’t ask: “What reforms are possible that just weren’t conceivable ten years ago?”
tags: gov2.0, government, mysociety, operations, science, science education
| comments: 1
submit:
Jonathan Heiliger on Web Performance, Operations, and Culture
by Jesse Robbins | @jesserobbins | comments: 0
We were honored to have Jonathan Heiliger, Facebook’s VP of Technology Operations, as our opening keynote speaker at Velocity. Jonathan is one of the most accomplished leaders in our field, and is a master of the craft.
Here is his keynote in its entirety:
Note: Other videos from Velocity are being posted to VelocityConference.blip.tv
tags: development, executive, facebook, jonathan heiliger, leadership, operations, performance, velocity, velocityconf, web2.0, webops
| comments: 0
submit:
Announcing: Spike Night at Velocity
by Scott Ruthfield | @scottru | comments: 5Guest blogger Scott Ruthfield is a Program Committee member of the O'Reilly Velocity: Web Performance & Operations Conference.
- Chris Bissell, Chief Software Architect at MySpace, and members of the MySpace team will demonstrate a massive, real increase in traffic, and will manage it on-stage. MySpace already deals with tens of thousands of hits each second - we can't throw enough traffic at them to cause any harm - so they'll cause their own harm and then show how they work through it.
- Ryan Nelson, Operations Director for MLB Advanced Media and MLB.com, will walk us through a combination of war stories and live traffic management to show what happens when millions of baseball fans all want to see what's happened after the commercial break at the exact same time. Between their very popular desktop apps and their newly-announced iPhone game streaming, the MLB is a true leader in technology innovation with a rabid fan base that goes well beyond the Web 2.0 echo chamber.
tags: cloud, infrastructure, operations, performance, scalability, scale, spikenight, velocity, velocity09, velocityconf, web2.0, webops
| comments: 5
submit:
Ignite! comes to San Jose June 22nd - Submit your talks now!
by Jesse Robbins | @jesserobbins | comments: 0
Ignite! is coming to San Jose on Monday June 22, 2009 at 8:00 pm, attached to the Velocity Conference. Admission is free, open to all, and there will be a cash bar.
The deadline for talks is May 11th, so submit your talks now!
As with all Ignites each speaker will only get 20 slides that each auto-advance every 15 seconds for a total of five minutes. We'll be looking for fun geek topics like hacks, how-to's, and insights. (Talks don't have to be Velocity-related!) If you're not sure what an Ignite talk looks like check out the Ignite Show.
tags: events, ignite, operations, san jose, velocity, velocityconf, web2.0, webops
| comments: 0
submit:
Velocity Preview - The Greatest Good for the Greatest Number at Microsoft
by James Turner | comments: 4
You may also download this file. Running time: 00:20:26
Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.
The psychology of engineering user experiences on the web can be difficult. How much rich content can you place up on a page before the load time drives away your visitors? Get the answer wrong, and you can end up with a ghost town; get it right and you're a star. Eric Schurman knows this well, since he is responsible for just those kind of trade-off decisions on some of Microsoft's highest traffic pages. He'll be speaking at O'Reilly's Velocity Conference in June, and he recently talked with us about how Microsoft tests different user experiences on small groups of visitors.
James Turner: Why don't you start by describing what your gig at Microsoft is now and what your career path has been there?
Eric Schurman: I'm a principal dev lead for Live Search, what used to be MSN Search. And I started at Microsoft back in the late 90s working in Microsoft's Press organization, where we actually were developing training software that would emulate new Microsoft products, but didn't require those products to be on a user's machine. So, for example, if you had an organization that was running Windows 95, we would have a training system for Windows 98 that would emulate a bunch of the functionality of Windows 98 so that you could deploy it to your people. They could train their people on how to use Windows 98 before they actually deployed it.
I then moved on to the Microsoft Press website, where I became the dev lead for it. I made a few other moves and ended up going to Microsoft.com, where I ran the download center, the Microsoft.com homepage, the product catalog, and a bunch of other places from a dev perspective.
I then moved to what was then MSN Search, back in about 2005, and was there through the MSN to Live transition. At the time, I wasn't working on performance; I was just working on the Live Search application. And it became very obvious that we had some major performance problems. Performance has always been one of my really strong interests, so I took on addressing a lot of those. And when we addressed them, we had very significant improvements in our business metrics. That really surfaced how important performance was to the organization, and I moved into a role where I was really focusing just on performance. I've been in that role now for about two years.
JT: You've worked on at least three very different parts of the Microsoft website. The homepage has lots of hits, fairly static. The download page is a lot of data for long periods of time. Live Search is high volume, but there's also a lot of backend on that. In what ways do you need to architect them differently? And where can you reuse the same lessons?
ES:: That's a great question. On the web, you've got different concerns on what you have for client apps. The main things that tend to impact end-user perceived performance on the web are often things about how you've designed your application from a network perspective. So how many different HTTP get requests are you making? How are those get requests structured? So, for example, are they serialized? Did you have a JavaScript file that then gets returned to the browser that requests another JavaScript file and another JavaScript file and then some content and then it finally gets rendered? So the number of assets that you request, that's going to be something that's important no matter what product your doing.
There are other things, like how much script do you have on the page, how much CSS you have on the page, how much actual content are your rendering to the page, etcetera. There are tricks that you can use like combining many different graphics into a single tiled image and sending that down to the browser. It's much faster to send one image to the browser than, say, 20 images. Even if you end up sending the same overall graphics, but combined into one, it's still must faster to send it as one request.
There are also different data volume concerns. They're also different from a business perspective. A lot of what we were sending out from the download center was extremely time critical. We would have an update go out, and we needed to make sure that update was going to be available anywhere in the world within a certain time frame, which required us to handle very high bandwidth, and a very high volume of requests coming into the site that were transferring lots of bits. So that required something totally different than something like the Microsoft.com homepage.
It's also interesting looking at the volume of traffic and how that traffic reflects real users. So, for example, one of the problems that you end up with on both the Microsoft homepage and Live Search is that we have a huge number of bots that are trying to hit the system, lots of people trying to do SEO work are trying to hit search engines to gather information about their site, about competitor sites, about all sorts of things. On the Microsoft.com homepage, it's always under distributed denial of service attacks. It's not a question of how frequently does it happen; it's just what is the rate right now? Also, the Microsoft.com homepage has historically had such a high up-time rate that it's actually hit by a lot of hardware devices simply to check for connectivity to the internet. And so you'd want to treat a request from that kind of "user" very differently from a request that's coming from a real user.
So that's kind of a long, rambling answer to your question. Do you have any areas that you want me to drill in or maybe talk about something else?
tags: interviews, microsoft, operations, velocity09, velocityconf, web2.0, webops
| comments: 4
submit:
Velocity 2009 - Big Ideas (early registration deadline)
by Jesse Robbins | @jesserobbins | comments: 7
My favorite interview question to ask candidates is: "What happens when you type www.(amazon|google|yahoo).com in your browser and press return?"
While the actual process of serving and rendering a page takes seconds to complete, describing it in real detail can take an hour. A good answer spans every part of the Internet from the client browser & operating system, DNS, through the network, to load balancers, servers, services, storage, down to the operating system & hardware, and all the way back again to the browser. It requires an understanding of TCP/IP, HTTP, & SSL deep enough to describe how connections are managed, how load-balancers work, and how certificates are exchanged and validated... and that's just the first request!
Web Performance & Operations is an emerging discipline which requires incredible breadth, focusing less on specific technologies and more on how the entire system works together. While people often specialize on particular components, great engineers always think of that component in relation to the whole. The best engineers are able to fly to the 50,000 foot view and see the entire system in motion and then zoom in to microscopic levels and examine the tiny movements of an individual part.
John Allspaw recently described this interconnectedness on his blog:
With websites, the introduction of change (for example, a bad database query) can affect (in a bad way) the entire system, not just the component(s) that saw the change. Adding handfuls of milliseconds to a query that’s made often, and you’re now holding page requests up longer. The same thing applies to optimizations as well. Break that [bad] query into two small fast ones, and watch how usage can change all over the system pretty quickly. Databases respond a bit faster, pages get built quicker, which means users click on more links, etc. This second-order effect of optimization is probably pretty familiar to those of us running sites of decent scale.
Working with these systems requires an understanding not only of the way technology interacts, but the way that people do as well. The structure, operation, and development of a website mirrors the organization that creates it, which is why so many people in WebOps focus on understanding and improving management culture & process.
Organizing a conference like Velocity is a wonderful challenge because it requires the same sort of thinking. We focus on the big concepts that everyone needs to know and then go deep into the technologies that change our understanding of the system. We find ways to share the unique experience that can only be gained by operating at scale. We make it safe to share as much of the "Secret Sauce" as we can.
Please join us at Velocity this year, we have an amazing lineup of speakers & participants. Early registration ends on Monday, May 11th at 11:59 PM Pacific. (Radar readers can use "vel09cmb" for an additional 15% discount.)
tags: cloud, data, infrastructure, operations, scale, velocity, velocity09, velocityconf, web, web2.0
| comments: 7
submit:
Velocity Preview - Keeping Twitter Tweeting
by James Turner | comments: 3
You may also download this file. Running time: 00:10:46
Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.
If there's a site that exemplifies explosive growth, it has to be Twitter. It seems like everywhere you look, someone is Tweeting, or talking about Tweeting, or Tweeting about Tweeting. Keeping the site responsive under that type of increase is no easy job, but it's one that John Adams has to deal with every day, working in Twitter Operations. He'll be talking about that work at O'Reilly's Velocity Conference, in a session entitled Fixing Twitter: Improving the Performance and Scalability of the World's Most Popular Micro-blogging Site, and he spent some time with us to talk about what is involved in keeping the site alive.
James Turner: Can you start by describing the platforms and technologies that make Twitter run today?
John Adams: Twitter currently runs on Ruby on Rails. And we also use a combination of Java and Scala, and a number of homegrown scripts that run the site. We also use a lot of open-source tools like Apache, MySQL, memcached.
JT: What type of hardware are you running on?
JA: It's all Linux, so a lot of x86 hardware. I can't tell you the brands or how many.
JT: Do you make any kind of attempt to stay homogeneous in that?
JA: Yes, we do. All of our hardware is very consistent. It makes deployment of new software very easy. And we also use a number of configuration management tools like Puppet to deliver software to those machines.
JT: As anyone can see, Twitter has had a pretty explosive growth, especially recently. Were you prepared for this kind of ramp up?
JA: I don't think so. I mean we're growing week over week in enormous numbers. And we spend a lot of time calculating the growth and scalability of the site to make sure that we can handle the upcoming load.
JT: I mean obviously there are events like Oprah decides she's going to Tweet that are going to be spikes. Do you try to get warning of that stuff?
JA: Yeah. And frequently we know of major events happening. Major events are very predictable like Macworld, even any massive amount of media interaction, we have some fair warning beforehand.
tags: interviews, operations, twitter, velocity, velocity09, velocityconf, web2.0, webops
| comments: 3
submit:
Recent Posts
- AT&T Fiber cuts remind us: Location is a Basket too! | by Jesse Robbins on April 10, 2009
- Karmic Koalas Love Eucalyptus | by Simon Wardley on February 26, 2009
- Cloud Computing defined by Berkeley RAD Labs | by Artur Bergman on February 12, 2009
- Understanding Web Operations Culture - the Graph & Data Obsession | by Jesse Robbins on February 5, 2009
- Data Center Power Efficiency | by Jesse Robbins on November 29, 2008
- My Web Doesn't Like Your Enterprise, at Least While it's More Fun | by Jim Stogdill on November 25, 2008
- Velocity 2009: Themes, ideas, and call for participation... | by Jesse Robbins on November 20, 2008
- DisasterTech: "Decisions for Heroes" | by Jesse Robbins on November 1, 2008
- Sprint blocking Cogent network traffic... | by Jesse Robbins on October 31, 2008
- Amazon's new EC2 SLA | by Jesse Robbins on October 24, 2008
- Kaminsky DNS Patch Visualization | by Jesse Robbins on August 7, 2008
- The new internet traffic spikes | by Jesse Robbins on June 28, 2008









