On Teaching Programming With Python 3.0

Author: Nick Efford
Contact: nde@comp.leeds.ac.uk
Status: Final
Revision: 7
Last revised:2008-04-08

Abstract

In this paper, we explore some of the ways in which Python 3.0 improves on earlier versions of the language for the purpose of teaching programming. We highlight simplification of the type system and object model as improvements of particular significance, along with changes to some of Python's operators and its built-in capabilities for console I/O. We also examine what adopters will need to do to make existing code examples run with Python 3.0, and discuss other obstacles to early adoption such as the lack of suitable of textbooks and delays in introducing support for Python 3.0 into third-party tools and frameworks.

Contents

1   Introduction

"The language has two choices: either continue to bear the burden of what are now considered poor design decisions... or suck it up and let us try and fix some of these problems. It's like going to the dentist; it may hurt, but if that minor toothache goes untreated and develops into an abscess, you will wish you were dead."

Blog entry by Collin Winter

Like any programming language, Python has accumulated its own fair share of design flaws since its creation. Unlike many programming languages, Python is changing to address these problems. Two new versions of the language are currently in development: version 2.6, which retains backwards compatibility with previous releases; and version 3.0, which breaks backwards compatibility to the extent that even that simplest of programs, the classic 'Hello, World', will no longer work in its current form.

The changes being made in Python 3.0 are drastic and may cause pain for some, but they will improve the language significantly as a vehicle for the teaching of programming in schools and universities. This paper explains why this is so and discusses some of the obstacles that educators might face in making an early move to version 3.0 from earlier versions.

2   Why Version 3.0?

There are many improvements being made to Python in version 3.0 to make the language cleaner and more consistent. We discuss five of these improvements here.

2.1   Simpler Built-in Types

The type system in Python 2.x lacks the redundancy of the type systems in other languages, in keeping with Python's nature as a very high level language (VHLL). For example, there is just one built-in type to represent floating-point numbers, whereas many other languages have two or more; and there is no separate character type, on the basis that characters can be represented simply as strings of length 1.

And yet there are inconsistencies. For example, integers can be represented either by a fixed-size int type (32-bit or 64-bit, depending on your machine architecture) or by a long type that can represent an effectively unlimited range of values [1]. Strings also have two possible representations: the str type (used, confusingly, for both ASCII character strings and strings of arbitrary bytes) and the unicode type (used for strings composed of Unicode characters).

Python 3.0 removes these inconsistencies. It has a single int type, with behaviour equivalent to the old long type. Similiarly, there is now a single character string type, called str but with the behaviour and implementation of the old unicode type. Sequences of bytes can be represented by the new bytes type, different from str but supporting conversion to and from str via the decode and encode methods. The net result of these changes is a simpler and more intuitive type system.

2.2   Unsurprising Arithmetic

In common with other popular languages like Java, C and C++, Python 2.x implements 'floor division' for integers:

>>> 1 / 2
0

This is an unwelcome surprise for most programming novices. In our experience of teaching Python 2.x (and before that, Java and C++), most students make mistakes with integer division more than once before they learn to cope with this counterintuitive behaviour.

Python 3.0 changes int division so that it now returns a float value. The // operator can be used to obtain the old behaviour:

>>> 1 / 2
0.5
>>> 1 // 2
0

For programmers whose expectations have been shaped by exposure to the C family of languages, this change has been the source of much anguish ever since it was first proposed in PEP 238 back in 2001 (as can be seen in comp.lang.python newsgroup discussion on the topic), but it removes what was clearly a significant hurdle for those new to programming.

2.3   Cleaner Comparisons

Python 2.x allows comparison of incompatible types using <, >, etc:

>>> 42 < 'hello'
True

Newcomers to programming obviously have difficulty understanding why this expression should evaluate to True -- or why it should evaluate at all, for that matter. Thankfully, in Python 3.0, such operations trigger an exception:

>>> 42 < 'hello'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()

A consequence of this is that it is no longer possible to sort lists that contain a mixture of incompatible types.

A minor enhancement to comparisons is the removal of the long-deprecated <> as a 'not equal to' operator. In Python 2.x, this or the more usual != could be used, but in Python 3.0 != is the only operator available for this purpose. The <> operator is but one of a number of obsolete or little-used features that are being removed from the language [2].

2.4   Greater Consistency in Console I/O

2.4.1   Printing

A long-standing design wart has been the asymmetry that exists in console I/O. In Python 2.x, input from the console is handled by functions, whereas output to the console is typically handled by a print statement:

import math

number = float(raw_input('Enter a value: '))
print 'Square-root of', number, 'is', math.sqrt(number)

We have observed more than one student attempting to put brackets around the list of things to be printed in a program like this, presumably in the belief that they are bundling up the arguments to a function call. Unfortunately, this does not result in an error; instead, it creates and prints a tuple of objects, like so:

('Square-root of', 2.0, 'is', 1.4142135623730951)

Students unfamiliar with the concept of a tuple can easily form the impression that Python prints things in a rather odd way.

Python 3.0 fixes this asymmetry by making print a built-in function:

print('Square-root of', number, 'is', math.sqrt(number))

Making print a function also removes the ugly 'redirection' syntax for printing to stderr or a previously opened file, added in Python 2.0. Thus a line like

print >> sys.stderr, 'Error! Number required'

becomes, in Python 3.0,

print('Error! Number required', file=sys.stderr)

The new print function supports keyword arguments sep and end, for specifying the strings used to separate arguments and terminate the printed line. (These strings default to a space and a newline, respectively.) This feature makes it possible to print multiple values on separate lines with one function call:

print(x, y, z, sep='\n')

2.4.2   Keyboard Input

For a long time, Python has had two built-in functions that read from the console: input and raw_input.

input reads a string of characters and attempts to evaluate them, such that a sequence of digits yields an integer value, characters enclosed in quotes yield a string, etc. Unfortunately, this is less useful that it sounds. Consider the following simple Python 2.x program:

# hello.py - a program that greets you

name = input('Enter your name: ')
print 'Hello,', name

Here are two attempts to run the program:

$ python hello.py
Enter your name: nick
Traceback (most recent call last):
  File "hello.py", line 1, in <module>
    name = input('Enter your name: ')
  File "<string>", line 1, in <module>
NameError: name 'nick' is not defined
$ python hello.py
Enter your name: max
Hello, <built-in function max>

In both cases, the student running the program has forgotten that string input should be enclosed in quotes, with the result that Python treats the inputs as names of objects in the global namespace. The first attempt fails because there is no object named nick, but the second attempt succeeds because Python has a built-in function named max. Both results are confusing to programming novices.

As another example, consider how we might read a number. Code like

number = input('Enter a value: ')

will do the right thing for inputs such as 42 or 7.5, but will not prevent input of 'hello' (which makes number a reference to a string object) or max (which, as we have already seen, makes it a reference to a built-in function). An obvious solution is to force conversion to the required numeric type, using the appropriate factory function:

number = float(input('Enter a value: '))

But if we are doing this, why bother evaluating the input at all? Why not simply read characters from the keyboard and rely on the float function to convert them into a number?

Both of the problems discussed above are solved in Python 2.x by using raw_input instead of input. The raw_input function returns console input as a string object and allows the programmer to decide exactly how this string should be handled. The string can be left alone in cases where text is expected (as in hello.py) or it can be converted explicitly to the required type (as in our second example):

number = float(raw_input('Enter a value: '))

We have seen cases where students have used input in their Python 2.x programs, having failed to recognise that raw_input is a safer, less confusing alternative. Python 3.0 prevents such confusion by providing a single function named input, equivalent to the raw_input function of earlier versions. Evaluation of the input string is still possible, but it must be done explicitly, with code like eval(input()).

2.5   A Single Object Model

Before Python 2.2 arrived in December 2001, there were significant differences between Python's built-in types and user-defined classes, such that it was impossible to create a user-defined subclass of a built-in type. Version 2.2 went a long way towards healing the class/type split by introducing new-style classes alongside the existing classic classes, and by turning most of the built-in types into these new-style classes. Since version 2.2, it has therefore been possible to begin class definitions in two ways:

# Old-style
class Foo:
    ...

# New-style
class Foo(object):
    ...

Students sometimes fail to recognise that these examples are two different kinds of class. This can be a particular problem for those coming to Python from Java, where class Foo and class Foo extends Object are entirely equivalent ways of beginning a class definition.

Unfortunately, many of the books and online tutorials on Python published since the release of version 2.2 have done little to clarify the distinction between old- and new-style classes or provide adequate guidance to novice programmers on which type of class should be used. Some [3] [4] [5] have ignored new-style classes altogether and others [6] have gone so far as to acknowledge their existence whilst using old-style classes almost exclusively in example code. In some cases [7] there is a balanced discussion of the two class types, whereas a few titles [8] [9] have encouraged a more modern approach by concentrating almost exclusively on new-style classes. This lack of a consistency can be very confusing for students.

Python 3.0 solves the problem by removing old-style classes entirely, leaving us with a single object model based on new-style classes. In version 3.0, class definitions start off much as they did before, but with the essential difference that object is implicitly a superclass (as is the case in Java and C#). Thus, there is no longer any difference between the two examples shown below.

# Python 3.0 class, inheriting implicitly from object
class Foo:
    ...

# Python 3.0 class, explicit syntax
class Foo(object):
    ...

3   Barriers to Adoption

We have suggested above that there is a strong case for switching to Python 3.0 for teaching, on the grounds that it is a cleaner and more consistent language than version 2.5. However, there are three obvious barriers to adoption:

  1. Suitable, 3.0-based textbooks might not be available.
  2. Most example code will need to be altered in some way to be compatible with Python 3.0, even if all that is required in most cases is translation of a print statement into a function call.
  3. Where there are dependencies on third-party modules or packages, it will be necessary to wait for these packages to be ported to 3.0.

Problem 1 will resolve itself eventually; in the meantime, a short-term solution might be to update existing, free material such as Downey's How to Think Like a (Python) Programmer. Problem 2 is minimised by tools provided with Python 3.0 to assist with conversion, as discussed below. Problem 3 is more troubling. In teaching programming to first-year students at Leeds, we have used Pygame, wxPython and Django to stimulate and motivate students with examples of how Python can be used for game, GUI and web programming. If these packages are not ported to version 3.0 soon after its release, then we will have to use 3.0-compatible alternatives (if they exist), abandon these examples entirely or (reluctantly) delay adoption of version 3.0.

4   Converting Code to Python 3.0

The Python 3.0 distribution comes with a refactoring tool called 2to3, intended to assist with the translation of Python 2.x code to Python 3.0. The tool operates on stdin, individual files or an entire directory tree. It writes a unified diff patch for each .py file to stdout, and a summary of which files needed changes to stderr. With the -w command line option, it will create a back-up of each file and then apply the patch to the file.

An an experiment, we tried running the 2to3 tool provided in the third alpha release of Python 3.0 on the collection of example programs used in our first-year programming lectures [10], using the -w option to apply the patches. The tool took 47.2 seconds to process 147 .py files [11] and changed 77 of them (52% of the total). The table below gives a breakdown of the changes made.

Feature Files
raw_input only 1
print only 46
raw_input & print 15
Other 15

In 80% of cases, the changes involved console I/O only and no manual intervention was required to produce satisfactory code. In 5% of cases, 2to3 produced code that ran correctly but was redundant in some way. An example is this:

def factorial(n):
    """Returns the factorial of a non-negative integer n."""
    if not isinstance(n, (int, long)):
        raise TypeError('argument must be an integer')
    elif n < 0:
        raise ValueError('argument cannot be negative')
    elif n <= 1:
        return 1
    else:
        return n*factorial(n-1)

2to3 performs a simple substitution, changing the first line of the function body to

if not isinstance(n, (int, int)):

rather than

if not isinstance(n, int):

Another example is this:

class Rainfall(object):
    """A set of monthly rainfall measurements."""
    ...
    def maximum(self):
        """Returns the maximum rainfall in this dataset."""
        return max(self.data.values())

    def total(self):
        """Returns the total rainfall over the entire dataset."""
        return sum(self.data.values())

2to3 changes the return statement of the maximum method to

return max(list(self.data.values()))

It does this because, in Python 3.0, the values method of a dict object returns a lightweight, read-only view of that dictionary's values, rather than a list. However, in this particular context, the change is unnecessary; the max function requires an iterable argument and views satisfy this requirement. Curiously, 2to3 does not make a similar change to the return statement of the total method.

In three of our .py files (2% of the entire set), 2to3 failed to make a change that was required for code to run correctly. One of these cases involved integer division:

def median(numbers):
    """Returns the median of a sequence of numbers."""
    size = len(numbers)
    data = sorted(numbers)
    n = size / 2
    if size % 2 == 0:
        return 0.5 * (data[n-1] + data[n])
    else:
        return data[n]

The division in n = size / 2 causes the problem. In Python 2.x, n is guaranteed to be an int because size is always an int; in Python 3.0, however, n is guaranteed to be a float, and float values cannot be used to index a list. The fix is simply to do floor division with //, but 2to3 is unable to deduce the need for this.

Another problem missed by 2to3 involved the use of range in the following lottery simulation program:

import random

numbers = range(1, 50)
chosen = []

while len(chosen) < 6:
    number = random.choice(numbers)
    numbers.remove(number)
    chosen.append(number)

chosen.sort()
print("This week's numbers are", chosen)
print("The bonus ball is", random.choice(numbers))

In Python 2.x, numbers is a list, so the attempt to invoke its remove method inside the while loop succeeds; in Python 3.0, however, numbers is an iterable object that doesn't have a remove method, so we get an AttributeError exception. We can fix this easily enough, by initialising numbers like so:

numbers = list(range(1, 50))

5   Conclusions

We have argued that there are good reasons for switching to Python 3.0 for the teaching of programming, and that there are also reasons to be cautious about making the switch immediately. Existing code examples can be converted automatically for the most part, although manual intervention will be necessary to check the correctness of the resulting Python 3.0 code. Of more serious concern to early adopters will be the initial shortage of suitable textbooks and delays in porting important third-party libraries and frameworks to version 3.0 of the language. These obstacles are temporary only; once they have disappeared, Python 3.0 will have a bright future for the teaching of programming.


[1]Python 2.2 began the process of unifying int and long by introducing automatic conversion from int to long where this would prevent integer overflow.
[2]Other features being removed include: backticks (use repr instead); the iterkeys, itervalues and iteritems methods of dictionaries (use keys, values and items instead); built-ins apply, callable, coerce, execfile, file, reduce and reload.
[3]P Norton, A Samuel, D Aitel, E Foster-Johnson, L Richardson, J Diamond, A Parker & M Roberts, Beginning Python, Wrox Press, 2005
[4]Mark Lutz, Programming Python (3rd ed.), O'Reilly, 2006
[5]A B Downey, How to Think Like a (Python) Programmer, http://www.greenteapress.com/thinkpython/ (last visited 2008-03-25)
[6]S Mount, J Shuttleworth & R Winder, Python for Rookies, Thomson Learning, 2008
[7]Magnus Lie Hetland, Beginning Python: From Novice to Professional, Apress, 2005
[8]Wesley Chun, Core Python Programming (2nd ed.), Prentice Hall, 2007
[9]Matt Telles, Python Power, Thomson Learning, 2007
[10]A small amount of code was omitted -- e.g., our Django examples.
[11]This was on an 2 GHz AMD Athlon 64 machine with 1 GB of RAM, running Ubuntu 7.10, using Python 2.5.1 compiled with GCC 4.1.3.