5 PaaS anti-patterns

Common behavior to watch out for when transitioning to a PaaS

Getting Started with OpenShiftToday I am going to cover 5 ways developers may be on a Platform as a Service (PaaS) but have not really embraced the new platform effectively. If you have done any of these things below while building your application hosted on a PaaS, like OpenShift, Heroku, or Google App Engine, don’t feel bad:

  • PaaS is a relatively new concept in the development world and I think some of these patterns are only recently coming to light
  • I have seen veteran developers making these mistakes as they move to the new paradigm

One piece of terminology I will use throughout the article is container. When I am using this word I am referring to the piece of the PaaS that hosts the application and does the work. An application can be composed of multiple containers and the PaaS will probably have a method to add your favorite server-side tech to the container. On OpenShift this is called a gear while on Heroku it is called a dyno.

So without further ado, let’s dig in on some of the code smells in the cloud.

Trying to Modify Code on the Server

You may be used to SSH’ing into your server and modifying code or configuration settings there. On all the PaaS platforms I know this is a no-no. For example, with OpenShift, the only changes that will persist between builds or restarts of the container are primarily done through interaction with the git repository. OpenShift creates a git repository in your container and then, if you use the command line or Eclipse tools, it clones that repository to your local machine.

This repository, along with environment variables, control almost all aspects of your application. When you want to make a code change, you make it in your local repository, git commit, and then git push your changes to the container. Once the container receives the code it builds and deploys your code for you. Inside an OpenShift git repository there is a .openshift/config and an .openshift/markers directory which contain files that help set your server behavior. Think of the server and infrastructure as being ephemeral and disposable – if it doesn’t live in source or doc then it doesn’t exist.

Hard Coding UserNames, Passwords, and other System Information

When you work on a single server or VPS you are used to having system values permanently assigned. You probably also think about your code needing more rewriting if someone else wants to use it. You were likely also told that hard coding passwords was frowned upon. In cloud land, hard coding passwords or user names will break your application. As part of normal operations on a PaaS, your application may be moved from one server to another as the admins adjust the load. When this happens the IP address of your database server might change. If you hardcoded that address then your application is now broken.

Instead, you use environment variables, which are either provided by the PaaS or inserted into the container by you. For example, on Openshift, an application with Python, a database, and cron will have the following environment variables inserted by the OpenShift platform:

OPENSHIFT_SECRET_TOKEN=AAAAAFb0whdi15RKSKKatJ3UtXHQXGVCQMSiXvj_0GGCYK7mc2pt1aLMEGyGytCAQS
OPENSHIFT_GEAR_MEMORY_MB=512
OPENSHIFT_PYTHON_LD_LIBRARY_PATH_ELEMENT=/opt/rh/python27/root/usr/lib64
OPENSHIFT_PYTHON_PORT=8080
...
OPENSHIFT_BROKER_HOST=openshift.redhat.com
OPENSHIFT_APP_UUID=AAAf2c3a4382ec6f1cAAAAA
OPENSHIFT_UMASK=077
OPENSHIFT_CARTRIDGE_SDK_RUBY=/usr/lib/openshift/cartridge_sdk/ruby/sdk.rb
OPENSHIFT_MONGODB_DIR=/var/lib/openshift/AAAf2c3a4382ec6f1cAAAAA/mongodb/

Now instead of hard coding file paths, urls, or passwords you just use these environment variables. If you want to add your own environment variables, say an API key, OpenShift has command line tools to handle that:

$ rhc set-env VARIABLE1=VALUE1 VARIABLE2=VALUE2 -a myapp

rhc is the command line tool you use to interact with OpenShift for infrastructure-type commands. If you have a whole file of environment variables you can load them in one command:

$ rhc set-env /path/to/file -a myapp

One of the other great side effects of using environment variables is now your application is totally portable between developers. Since you haven’t hardcoded in DB settings, if you give your code to someone, they can create the same infrastructure on OpenShift and then just deploy your code. This is great if you build a standard project for developers in your group. When you have a new hire, just have them clone the git repo and push to OpenShift – they are good to go! Red Hat uses this functionality with OpenShift to produce quickstarts for you to get going quickly with your favorite FOSS projects.

Manually adding project dependencies and builds

Still to this day there are many developers who manually add libraries to their projects and do hand builds before deploy. This won’t work in a cloud environment. Again, you want to get your application build and deployment to be as automated and repeatable as possible. Manual build steps mean people have to touch the infrastructure whenever the app needs to scale or you want to migrate the code to another server. It also means the application is not completely self contained for a new developer to pick it up and build it.

For all of our application languages on OpenShift we support automated installation of libraries using the most popular dependency managers:

Table 1. Dependency mechanisms used by OpenShift, by language
Language Dependency mechanism

Java

Maven

Python

Pip

Ruby

Gem

Node.js (JavaScript)

NPM

PHP

Pear

Perl

CPAN

For each of these languages we provide a file in the project git repository where you can declare dependencies and other build information:

Table 2. A Table
Language Dependency File

Java

pom.xml

Python

setup.py

Ruby

Gemfile.lock

Node.JS (JavaScript)

package.json

PHP

deplist.txt

Perl

deplist.txt

Trying to Put all the Things in one container

This is another one of those areas that was a bad idea in pre-cloud days but now is really not a good idea. Before you might have bought a beefy server and then thrown your web server, database server, cron jobs, and a whole bunch of applications on that one machine. Perhaps if you had the money or an ops team you could get your database on a separate server but then that server would handle all the DBs for all your applications.

With a PaaS, the unit of currency is the application and it is composed of many services. You want to try and isolate those services in their own containers or groups of containers. You would achieve this on OpenShift by creating a scalable application. With a scalable application each cartridge is placed in its own container. So if you have a scalable Ruby application with MongoDB your Apache with Passenger would be in one container and your MongoDB server would be in a different container. With this configuration your app server and your DB do not compete for resources, such a CPU, memory or disk. But the additional benefit is that your services can be managed independently. You can restart individual pieces or swap out services as you need them. The idea is to get away from a monolithic server stack and monolithic applications.

Building for Vertical Scalability

Vertical Scalability is the idea that to handle more requests you get a bigger server – more RAM, more CPU cores, faster disks. Up until the past 5 years or so, this is the way most everyone handled scaling. When things started to crumple under load you would call and tell your Ops team you need a bigger server. They would eventually get a server and then you would migrate your application to the new server. Hopefully this would handle the load. You might have the money to pay for expensive DB and App server software that knew how to cluster across multiple machines but that would bring all sorts of other complications.

Recently though, things have changed in the hardware world that have made developers tackle the scaling problem differently. Between the rise of cheaper commodity servers and Infrastructure as a Service (IaaS), the development landscape has started to shift to horizontal scalability. With horizontal scalability the answer to increased load is to add more smaller, replaceable, cheap servers into a pool of existing servers and have the application start to use the new servers.

This newer paradigm is the model you should think of when developing applications on a PaaS. For most of your apps you should stop thinking of big monolithic applications (unless you are designing flight control systems). In the same way that loosely coupling your code helps with the creation, maintenance, and improvement of your code, the same thing could be said of the services that make up your application. For example, when I now think about building applications on OpenShift, there is a larger meta-application that the end user interacts with but then there are the smaller applications which make up the services. I think of a messaging application that might use ActiveMQ and will solely be responsible for messaging between other pieces of my meta-application, a REST service built with Flask or JAX-RS for sending data to the client side, an application which just serves up the web content, and probably a database service to store the data for just this meta-application.

This style of developing applications has started to earn the name micro services. I would say one of the early pioneers in this style was NetFlix and they were also one of the largest companies to adopt putting its applications in “the cloud”. But out of the box, this style of application development is geared towards horizontal scalability. Another company who showed the way in its data centers was Google. It published numerous papers on the servers it builds, the way its ops teams manages those servers, and then on how it teaches its developers to think of scalability as adding another server to the swarm rather than needing a bigger boat.

Other Reading

I hope I have given you a good start on your way to some best practices when creating your application for a PaaS deployment. I tried to point out the most common anti-patterns I see in the field but there are certainly more out there. I also gave you a brief introduction to the patterns. There is certainly a lot more reading you can do:

  • Our friends at Heroku have advocated a 12 factor application. These types of applications will do well in the cloud and on a PaaS.
  • As I mentioned in the last section, adopting a microservices approach to your application can also help you avoid anti-patterns. There is a whole site now devoted to discussing and advancing anti-patterns.
  • There is a nice article in Linux Journal that will help introduce some of the concepts of PaaS.
  • If you choose to work with OpenShift, you can download our O’Reilly e-book for free. It is short but sweet and discusses more of these details.
tags: , , , ,