Python 3: threat or menace?

I wish I still had my copy of this: a Harvard Lampoon parody of Life magazine from the ’60s, displaying a picture of a flying saucer and the ominous headline: “Flying Saucers: Threat or Menace?”.

I was reminded of this when reading some recent articles worrying about the slow transition from Python 2 to Python 3, such as Python 3 is Killing Python. The authors of such articles, and Python developers in general, really like Python, and for the most part like Python 3. Their main concern is that the protracted 2-3 straddle will hurt Python’s popularity.

About five years ago, I started writing an introductory Python book for O’Reilly. It featured Python 2, which was dominant then. Unfortunately, the tides of business went out and took the book with them. Two years ago, the tides returned and the book was revived. Introducing Python: Modern Computing in Simple Packages is finally in production and early release.

When we rebooted the book, there was now a serious question of whether to feature Python 2 or 3. The other version might merit some sidebars or an appendix, but we really needed to pick just a single base for the code examples. And by now it seemed that Python 3 had become the right choice. If you’re wondering why the editors and I thought Python 3 was best for this book, let me give some of the reasons, more or less in order of importance.

First, the book is aimed at beginning programmers, or beginning Python programmers. If you’re learning Python, why not learn the best and most up-to-date version? Then you won’t need to unlearn some of the Python 2 misfeatures that Python 3 was designed to correct.

If you’re not a new programmer, it’s likely that you don’t have a choice: you’re stuck with whatever version of Python is on the machines you’re working on. This is likely to be Python 2. Although Python 2 will be supported until 2020, most operating systems that include Python will switch to Python 3 long before then.

One argument against Python 3 has been of the chicken-and-egg variety: I need package X and it hasn’t been converted, so I won’t convert my application. Python 3 has been around for over five years now. Any version from 3.3 on will do anything 2.7 did, and more. So, how many packages are moving from Python 2 to 3?

Two sites track how many of most popular packages at the main third-party Python site PyPI have been converted from 2 to 3: 165 of the top 200 at the Python 3 Wall of Superpowers (formerly called the Python 3 Wall of Shame, indicating a perceived tipping point), and 270 of the top 360 at Python 3 Readiness. The Can I Use Python 3? site will check a project for any blockers to Python 3 use.

While writing the book, I did cover some third-party packages that have not yet been converted from Python 2 to 3: scapy and scrapy (unrelated projects, also good names for pet chipmunks), gevent, and a few others. I felt they were unusually useful in some way, and would expect them to be ported to Python 3 before very long.

Python 3 isn’t a radical change (except for Unicode handling, which we’ll get to shortly). The visible changes are fairly small. It still has that minty Python scent, that art house whitespace framing, and that lovely, readable syntax. Python has always been — and this is not faint praise — a nice language. 3 is just a bit nicer. The developers chipped away at the technical debt that every project accumulates: using more consistent naming and behavior, dropping obsolete pieces, and of course fixing many bugs.

Finally — those were carrots, so here comes the stick — Python 2 is a dead end. It is the Norwegian Blue, pining for the fjords. Although goodies continue to be back-ported from 3 to 2, new development belongs to the 3 line. There will be no hybrid beast called Python 2.8.

Writing Compatible Code

If you need to use Python 2 now and want to move more easily to Python 3 in the future when it becomes the standard version on your system, use one of the bridge packages. A good strategy is to import the standard __future__ module. This includes features to make Python 2 work like Python 3, and leave Python 3 code unharmed.

print_function — parenthesized arguments, redirection, and output separators.
unicode_literals — quoted strings are Unicode sequences, not bytes.
division — dividing integers with / makes a float, with // makes an integer.
absolute_import — ensure you’re importing what you expect when multiple modules have the same name.

Print

The most obvious language change is the syntax for print. It’s a function in Python 3, so its arguments need to be in parentheses. In Python 2, it’s a bare statement like return, and should not use parentheses. Here’s an example of what happens in both versions of Python, and how a simple from __future__ import print_function lets you use the same code:

Python 2	Python 3
>>> print "Greetings", "Earthling" Greetings Earthling	>>> print "Greetings", "Earthling" File "<stdin>", line 1 print "Greetings", "Earthling" ^ SyntaxError: invalid syntax
Python 3 barfs with the old `print` style.
>>> print ("Greetings", "Earthling") ('Greetings', 'Earthling')	>>> print ("Greetings", "Earthling") Greetings Earthling
You’d think using parentheses with Python 2 would solve portability. Nope. Python 2 prints a tuple.
>>> from __future__ import print_function >>> print ("Greetings", "Earthling") Greetings Earthling	>>> from __future__ import print_function >>> print ("Greetings", "Earthling") Greetings Earthling
Finally, a solution that works with both versions.

Unicode

Unicode was the main reason for Python 3, and it turned out to be a thornier problem that most people expected.

For a long time, characters fit into eight bits, using the ASCII encoding. This was designed by Americans, who had no need for things like accents, diacritical marks, symbols, or other suspect foreign stuff. (If the French had got there first, we’d probably have FRESCII and more stylish keyboards, possibly with flower holders.)

But the world and its exotic non-ASCII characters intruded, and for a long time they were handled by defining alternative eight-bit character sets like Latin-1 and Windows-1252. These character sets stuffed characters into the slots that ASCII left unused. But you still needed to specify how to shift between these character sets, and none of the solutions worked that well. The same character might have a different byte value in different character sets. You’ve seen this if you cut and paste among web sites, databases, and Word documents.

So the Unicode consortium decided to define a unique integer value for every character in every language in the world, as well as various types of symbols. Now a Latin capital A was uniquely defined, even if some character in another language happened to look like it.

Computers store and transmit only bytes. You need to encode a character to one (or more) bytes, and decode byte sequences to characters. You can’t tell the encoding of a byte stream for sure just by looking at it; you need prior agreement on what encoding is used. Many Unicode encodings exist, but the most popular one is UTF-8. This is a variable-length encoding with some nice properties: a valid ASCII byte stream is a valid UTF-8-encoded byte stream, using only one byte per character. If you need an accent, a symbol, or Lao Tsu’s limericks, then UTF-8 uses certain bits to indicate how many bytes are needed to encode each character.

Python 2 strings are byte sequences; you may or may not know the correct encoding to extract characters. Python 3 made a clean break by redefining str as a Unicode character sequence, and bytes as a sequence of eight bit integers. Imagine bytes as beads, and str as a charm bracelet.

You encode str to bytes and decode bytes to str, with your chosen encoding. UTF-8 is the most common and preferred, but there are others. The internal Python 3 string machinery hides the messy details. For example, finding a “lower-case letter” in a regular expression looks for any Unicode character defined as a lower-case letter, not just ASCII’s a through z. You don’t worry about how it handles the byte lengths internally.

For most developers, most of the time, the Python 3 separation of byte and character strings works well. Ned Batchelder’s Pragmatic Unicode has great usage tips, including the Unicode sandwich: use Unicode inside your program, and bytes on the outside. But this separation can work less well for people who deal with low level plumbing like network protocols, as described in detail by Armin Ronacher:

Nick Coghlan’s response, in Python 3 and ASCII Compatible Binary Protocols, was basically: yes, we broke some things in Python 3, we broke them on purpose, and here’s why. He expands on this in Python 3 Q & A, especially in the section Is Python 3 more convenient than Python 2 in every respect? If you’re a systems-level Python developer, these posts are very helpful.

Division

This one’s pretty straightforward: if your program contains from __future__ import division, then you’ll get Python 3-style division, whether you’re using a Python 2 or Python 3 interpreter.

Absolute Imports

Sometimes you might want to import a “standard” module (within one of the directories in the sys.path list), and other times you might want to import a module relative to some of your own code. In Python 2, if two modules have the same name, there’s a chance that you’ll accidentally import the wrong one, or “shadow” a standard library module. In Python 3, the normal import syntax only looks in sys.path. If you have a package and want module a to import a sibling module b (in the same package directory), you need to use the absolute import syntax: from . import b. This lets you use a local module with the same name as a standard module. Use from __future__ import absolute_import to get this behavior in your Python 2 or Python 3 code.

Recommendations

Python is a pragmatic language. Use the version that solves your problems; as the plastic surgeon said, it’s no skin off my, um, nose. Some free advice, which is worth the price:

If you have a choice, use Python 3. Use the official core distribution, or one like Anaconda, which bundles the core with many scientific packages.
If you only have Python 2 and don’t need to port to Python 3, use plain Python 2.
If you want a common code base, use import __future__ or one of the other portability packages mentioned below.
Watch out for UFOs.

More info

Porting Python 2 Code to Python 3 — good tips and more details
2to3 – Automated Python 2 to 3 code translation — this library fixes some things in your Python 2 code, like print parentheses
Six: Python 2 and 3 Compatibility Library — six is a low-level bridge module
Porting to Python 3: An in-depth guide — a book with many examples
python-modernize — extends 2to3
Easy, clean, reliable Python 2/3 compatibility — the third-party python future extends the standard __future__
Porting to Python 3 Redux — more tips from the author of Flask and Jinja2.

Public domain snake illustration courtesy of Internet Archive.

Update, Sept. 30, 2014 — “UFOs: Threat or Menace?” was changed to “Flying Saucers: Threat or Menace?”