Dead Batteries Included

Recharging the Python standard library

It’s unfortunate that the official About Python page still describes Python’s standard library as having “batteries included.” Sure, some of those old standbys will keep your project going and going, but many of them are leaking acid all over the place. Guido Van Rossum, head developer of Python, has said “the stdlib offerings … are not very convenient and may not support popular idioms very well.” Five years ago, I always assumed the Python library contained the “best of breed” for all packages. These days, I tend to think the opposite.

To counteract this minor flaw, I keep a small “personal standard library.” I keep a pip requirements file listing all the packages I use in every project. A simple script automatically installs that file whenever I create a virtualenv for a new project. With the pip download cache enabled, this is a near-painless process.

My favorite third-party standby is the relatively unknown path.py module. It provides a beautiful object oriented interface to file manipulation operations. This example illustrates several of path.py's nifty features:

Notice the division operator overloading and how makedirs returns the path object so it can be used in further path construction. This syntax is much more readable than the older os.path module in the standard library. I like to have all my projects depend on path.py from the start so that I don’t have to make a development time decision as to whether to add a dependency. Third party dependencies are very easy to include if you’re working with setuptools.

Speaking of setuptools, it’s another required module in my personal standard library. Last summer’s merging of the distribute / setuptools fork makes it the obvious best way to do package management in Python. In contrast to its history, the project is now well-documented and easy to use.

The famous requests module is probably the most obvious package to be included in any web developer’s personal standard library. Its author, Kenneth Reitz, prefers to keep it as a third party module, as he boldly states that the Python standard library is where Python modules go to die.

Another key dependency is the pytz library. The standard datetime library has support for timezones, but accidentally encourages use of so-called naive datetimes, which do not include timezone information. Naive datetimes are a heinous contrivance, yet it is impossible, using just the standard library, to work effectively with timezones other than UTC.

There is, however, a very good reason that pytz is not included in the standard library. Worldwide timezone information changes on a regular basis. Considering the slower release cycle of Python itself, it is important to keep pytz as a third-party library so it can react quickly to new decisions about daylight savings time or changing timezone boundaries. Regardless, it is vital to have this package included in any package that manipulates dates.

And then there’s testing and documentation. Python ships the robust unittest module with its standard distribution, but I prefer to use the much more agile py.test library. The Python language has built-in support for inline documentation, but without the incredible sphinx documentation engine, the feature is seriously crippled. Even the reference Python interpreter and debugger have superhero third-party versions, and I use ipython and ipdb everywhere I work.

The good news is that the Python developers are aware of and actively discussing the shortcomings of the Python Standard Library. They composed PEP411 to actively address the dead and dying batteries issue. The proposal introduces a provisional stage that allows packages to go into the standard library without enforcing the hard guarantees of backwards compatibility and API stability that previously made standard library inclusion undesirable. Further, Python has had a solid deprecation policy for many years; even longer than it’s had a style guideline!

A robust standard library allows new developers to get up and running quickly without having to understand the intricacies of packaging. However, Python packages are easily able to specify, download, and install their own dependencies so there is little call for packages to be included in the standard library. Therefore, in production systems, Python programmers should never restrict themselves to the standard library and should be open—even eager—to depend on third party packages that provide the APIs and functionality they need.

tags: ,

Get the O’Reilly Programming Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.