Defining clouds, web services, and other remote computing

Part 2 of the series, "What are the chances for a free software cloud?"

Previous section:
Introduction: Resolving the contradictions between web services, clouds, and open source

Technology commentators are a bit trapped by the term
“cloud,” which has been kicked and slapped around enough
to become truly shapeless. Time for confession: I stuck the term in
this article’s title because I thought it useful to attract
readers’ attention. But what else should I do? To run away from
“cloud” and substitute any other term (“web
services” is hardly more precise, nor is the phrase
“remote computing” I use from time to time) just creates
new confusions and ambiguities.

So in this section I’ll offer a history of services that have
led up to our cloud-obsessed era, hoping to help readers distinguish
the impacts and trade-offs created by all the trends that lie in the
“cloud.”

Computing and storage

The basic notion of cloud computing is simply this: one person uses a
computer owned by another in some formal, contractual manner. The
oldest precedent for cloud computing is therefore timesharing, which
was already popular in the 1960s. With timesharing, programmers could
enter their programs on teletype machines and transmit them over
modems and phone lines to central computer facilities that rented out
CPU time in units of one-hundredth of a second.

Some sites also purchased storage space on racks of large magnetic
tapes. The value of storing data remotely was to recover from flood,
fire, or other catastrophe.

The two major, historic cloud services offered by the
Amazon.com–Elastic Compute Cloud (EC2) and Simple Storage Service
(S3)–are the descendants of timesharing and remote backup,
respectively.

EC2 provides complete computer systems to clients, who can request any
number of systems and dismiss them again when they are no longer
needed. Pricing is quite flexible (even including an option for an
online auction) but is essentially a combination of hourly rates and
data transfer charges.

S3 is a storage system that lets clients reserve as much or as little
space as needed. Pricing reflects the amount of data stored and the
amount of data transferred in and out of Amazon’s storage. EC2 and S3
complement each other well, because EC2 provides processing but no
persistent storage.

Timesharing and EC2-style services work a bit like renting a community
garden. Just as community gardens let apartment dwellers without
personal back yards grow fruits and vegetables, timesharing in the
1960s brought programming within reach of people who couldn’t
afford a few hundred thousand dollars to buy a computer. All the
services discussed in this section provide hardware to people who run
their own operations, and therefore are often called
Infrastructure as a Service or IaaS.

We can also trace back cloud computing in another direction as the
commercially viable expression of grid computing, an idea
developed through the first decade of the 2000s but whose
implementations stayed among researchers. The term “grid”
evokes regional systems for delivering electricity, which hide the
origin of electricity so that I don’t have to strike a deal with
a particular coal-burning plant, but can simply plug in my computer
and type away. Similarly, grid computing combined computing power from
far-flung systems to carry out large tasks such as weather modeling.
These efforts were an extension of earlier cluster technology
(computers plugged into local area networks), and effectively
scattered the cluster geographically. Such efforts were also inspired
by the well-known
SETI@home program,
an early example of Internet crowdsourcing that millions of people have
downloaded to help process signals collected from telescopes.

Another form of infrastructure became part of modern life in the 1990s
when it seemed like you needed your own Web site to be anybody.
Internet providers greatly expanded their services, which used to
involve bare connectivity and an email account. Now they also offer
individualized Web sites and related services. Today you can find a
wealth of different hosting services at different costs depending on
whether you want a simple Web presence, a database, a full-featured
content management system, and so forth.

These hosting services keep costs low by packing multiple users onto
each computer. A tiny site serving up occasional files, such as my own
praxagora.com, needs nothing that
approaches the power of a whole computer system. Thanks to virtual
hosting, I can use a sliver of a web server that dozens of other sites
share and enjoy my web site for very little cost. But praxagora.com
still looks and behaves like an independent, stand-alone web server.
We’ll see more such legerdemain as we explore virtualization and
clouds further.

The glimmer of the cloud in the World Wide Web

The next great breakthrough in remote computing was the concept of an
Application Service Provider. This article started with one
contemporary example, Gmail. Computing services such as payroll
processing had been outsourced for some time, but in the 1990s, the
Web made it easy for a business to reach right into another
organization’s day-to-day practice, running programs on central
computers and offer interfaces to clients over the Internet. People
used to filling out forms and proceeding from one screen to the next
on a locally installed program could do the same on a browser with
barely any change in behavior.

Using an Application Service Provider is a little like buying a house
in the suburbs with a yard and garden, but hiring a service to
maintain them. Just as the home-owner using a service doesn’t
have to get his hands dirty digging holes for plants, worry about the
composition of the lime, or fix a broken lawnmower, companies who
contract with Application Service Providers don’t have to
wrestle with libraries and DLL hell, rush to upgrade software when
there’s a security breach, or maintain a license server. All
these logistics are on the site run by the service, hidden away from
the user.

Early examples of Application Service Providers for everyday personal
use include blogging sites such as blogger.com and wordpress.com.
These sites offer web interfaces for everything from customizing the
look of your pages to putting up new content (although advanced users
have access to back doors for more complex configuration).

Interestingly, many companies recognized the potential of web browsers
to deliver services in the early 2000s. But browsers’ and
JavaScript’s capabilities were too limited for rich interaction.
These companies had to try to coax users into downloading plugins that
provided special functionality. The only plugin that ever caught on
was Flash (which, of course, enables many other applications). True
web services had to wait for the computer field to evolve along
several dimensions.

As broadband penetrated to more and more areas, web services became a
viable business model for delivering software to individual users.
First of all, broadband connections are “always on,” in
contrast to dial-up. Second, the HttpRequest extension allows browsers
to fetch and update individual snippets of a web page, a practice that
programmers popularized under the acronym AJAX.

Together, these innovations allow web applications to provide
interfaces almost as fast and flexible as native applications running
on your computer, and a new version of HTML takes the process even
farther. The movement to the web is called Software as a
Service
or SaaS.

The

pinned web site feature introduced in Internet Explorer 9

encourages users to create menu items or icons representing web sites,
making them as easy to launch as common applications on their
computer. This feature is a sign of the shift of applications from
the desktop to the Web.

Every trend has its logical conclusion, even if it’s farther
than people are willing to go in reality. The logical conclusion of
SaaS is a tiny computer with no local storage and no software except
the minimal operating system and networking software to access servers
that host the software to which users have access.

Such thin clients were already prominent in the work world
before Web services became popular; they connected terminals made by
companies such as Wyse with local servers over cables. (Naturally,
Wyse has

recently latched on to the cloud hype
.)
The Web equivalent of thin clients is mobile devices such as iPhones
with data access, or
Google Chrome OS,
which Google is hoping will wean people away from popular software
packages in favor of Web services like Google Docs. Google is planning
to release a netbook running Chrome OS in about six months. Ray
Ozzie, chief software architect of Microsoft, also speaks of an
upcoming reality of
continuous cloud services delivered to thin appliances
.
The public hasn’t followed the Web services revolution this far,
though; most are still lugging laptops.

Data, data everywhere

Most of the world’s data is now in digital form, probably in some
relational database such as Oracle, IBM’s DB2, or MySQL. If the
storage of the data is anything more formal than a spreadsheet on some
clerical worker’s PC (and a shameful amount of critical data is still
on those PCs), it’s probably already in a kind of cloud.

Database administrators know better than to rely on a single disk to
preserve those millions upon millions of bytes, because tripping over
an electric cable can lead to a disk crash and critical information
loss. So they not only back up their data on tape or some other
medium, but duplicate it on a series of servers in a strategy called
replication. They often transmit data second by second over
hundreds of miles of wire so that flood or fire can’t lead to
permanent loss.

Replication strategies can get extremely complex (for instance, code
that inserts the “current time” can insert different
values as the database programs on various servers execute it), and
they are supplemented by complex caching strategies. Caches are
necessary because public-facing systems should have the most commonly
requested data–such as current pricing information for company
products–loaded right into memory. An extra round-trip over the
Internet for each item of data can leave users twiddling their thumbs in
annoyance. Loading or “priming” these caches can take
hours, because primary memories on computers are so large.

The use of backups and replication can be considered a kind of private
cloud, and if a commercial service becomes competitive in reliability
or cost, we can expect businesses to relax their grip and entrust
their data to such a service.

We’ve seen how Amazon.com’s S3 allowed people to store
data on someone else’s servers. But as a primary storage area,
S3 isn’t cost-effective. It’s probably most valuable when
used in tandem with an IaaS service such as EC2: you feed your data
from the data cloud service into the compute cloud service.

Some people also use S3, or one of many other data storage services,
as a backup to their local systems. Although it may be hard to get
used to trusting some commercial service over a hard drive you can
grasp in your hand, the service has some advantages. They are actually
not as likely as you are to drop the hard drive on the floor and break
it, or have it go up in smoke when a malfunctioning electrical system
starts a fire.

But data in the cloud has a much more powerful potential. Instead of
Software as a Service, a company can offer its data online for others
to use.

Probably the first company to try this radical exposure of data was
Amazon.com, who can also be credited for starting the cloud services
mentioned earlier. Amazon.com released a service that let programmers
retrieve data about its products, so that instead of having to visit
dozens of web pages manually and view the data embedded in the text,
someone could retrieve statistics within seconds.

Programmers loved this. Data is empowering, even if it’s just
sales from one vendor, and developers raced to use the application
programming interface (API) to create all kinds of intriguing
applications using data from Amazon. Effectively, they leave it up to
Amazon to collect, verify, maintain, search through, and correctly
serve up data on which their applications depend. Seen as an aspect of
trust, web APIs are an amazing shift in the computer
industry.

Amazon’s API was a hack of the Web, which had been designed to
exchange pages of information. Like many other Internet services, the
Web’s HTTP protocol offers a few basic commands: GET, PUT, POST,
and DELETE. The API used the same HTTP protocol to get and put
individual items of data. And because it used HTTP, it could easily be
implemented in any language. Soon there were libraries of programming
code in all popular languages to access services such as
Amazon.com’s data.

Another early adopter of Web APIs was Google. Because its
Google Maps service exposed
data in a program-friendly form, programmers started to build useful
services on top of it. One famous example combined Google Maps with a
service that published information on properties available for rent;
users could quickly pull up a map showing where to rent a room in
their chosen location. Such combinations of services were called
mash-ups, with interesting cultural parallels to the
practices of musicians and artists in the digital age who combine
other people’s work from many sources to create new works.

The principles of using the Web for such programs evolved over several
years in the late 1990s, but the most popular technique was codified
in a 2000 PhD thesis by HTTP designer Roy Thomas Fielding, who
invented the now-famous term REST (standing for Representational State
Transfer) to cover the conglomeration of practices for defining URLs
and exchanging messages. Different services adhere to these principles
to a greater or lesser extent. But any online service that wants to
garner serious, sustained use now offers an API.

A new paradigm for programmers

SaaS has proven popular for programmers. In 1999, a company named VA
Linux created a site called
SourceForge
with the classic SaaS goal of centralizing the administration of
computer systems and taking that burden off programmers’ hands. A
programmer could upload his program there and, as is typical for free
software and open source, accept code contributions from anyone else
who chose to download the program.

VA Linux at that time made its money selling computers that ran the
GNU/Linux operating system. It set up SourceForge as a donation to the
free software community, to facilitate the creation of more free
software and therefore foster greater use of Linux. Eventually the
hardware business dried up, so SourceForge became the center of the
company’s business: corporate history anticipated cloud
computing history.

SourceForge became immensely popular, quickly coming to host hundreds
of thousands of projects, some quite heavily used. It has also
inspired numerous other hosting sites for programmers, such as
Github. But these sites don’t
completely take the administrative hassle out of being a programmer.
You still need to run development software–such as a compiler
and debugger–on your own computer.

Google leapt up to the next level of programmer support with
Google App Engine,
a kind of programmer equivalent to Gmail or
Google Docs.
App Engine is a cocoon within which you can plant a software larva and
carry it through to maturity. Like SaaS, the programmer does the
coding, compilation, and debugging all on the App Engine site. Also
like SaaS, the completed program runs on the site and offers a web
interface to the public. But in terms of power and flexibility, App
Engine is like IaaS because the programmer can use it to offer any
desired service. This new kind of development paradigm is called
Platform as a Service or PaaS.

Microsoft offers both IaaS and PaaS in its
Windows Azure
project.

Hopefully you now see how various types of remote computing are alike,
as well as different.

Next section:

Why clouds and web services will continue to take over computing.

tags: , , , , , ,