========================================= On Teaching Programming With Python 3.0 ========================================= :Author: Nick Efford :Contact: nde@comp.leeds.ac.uk :Status: Final :Revision: 7 :Last revised: 2008-04-08 .. admonition:: Abstract In this paper, we explore some of the ways in which Python 3.0 improves on earlier versions of the language for the purpose of teaching programming. We highlight simplification of the type system and object model as improvements of particular significance, along with changes to some of Python's operators and its built-in capabilities for console I/O. We also examine what adopters will need to do to make existing code examples run with Python 3.0, and discuss other obstacles to early adoption such as the lack of suitable of textbooks and delays in introducing support for Python 3.0 into third-party tools and frameworks. .. sectnum:: .. contents:: Introduction ============ "The language has two choices: either continue to bear the burden of what are now considered poor design decisions... or suck it up and let us try and fix some of these problems. It's like going to the dentist; it may hurt, but if that minor toothache goes untreated and develops into an abscess, you will wish you were dead." -- `Blog entry`_ by Collin Winter Like any programming language, Python has accumulated its own fair share of design flaws since its creation. Unlike many programming languages, Python is changing to address these problems. Two new versions of the language are currently in development: version 2.6, which retains backwards compatibility with previous releases; and version 3.0, which breaks backwards compatibility to the extent that even that simplest of programs, the classic 'Hello, World', will no longer work in its current form. The changes being made in Python 3.0 are drastic and may cause pain for some, but they will improve the language significantly as a vehicle for the teaching of programming in schools and universities. This paper explains why this is so and discusses some of the obstacles that educators might face in making an early move to version 3.0 from earlier versions. .. _Blog entry: http://oakwinter.com/code/on-python-3000-whinging/ Why Version 3.0? ================ There are many improvements being made to Python in version 3.0 to make the language cleaner and more consistent. We discuss five of these improvements here. Simpler Built-in Types ---------------------- The type system in Python 2.x lacks the redundancy of the type systems in other languages, in keeping with Python's nature as a very high level language (VHLL). For example, there is just one built-in type to represent floating-point numbers, whereas many other languages have two or more; and there is no separate character type, on the basis that characters can be represented simply as strings of length 1. And yet there are inconsistencies. For example, integers can be represented either by a fixed-size ``int`` type (32-bit or 64-bit, depending on your machine architecture) or by a ``long`` type that can represent an effectively unlimited range of values [#]_. Strings also have two possible representations: the ``str`` type (used, confusingly, for both ASCII character strings and strings of arbitrary bytes) and the ``unicode`` type (used for strings composed of Unicode characters). Python 3.0 removes these inconsistencies. It has a single ``int`` type, with behaviour equivalent to the old ``long`` type. Similiarly, there is now a single character string type, called ``str`` but with the behaviour and implementation of the old ``unicode`` type. Sequences of bytes can be represented by the new ``bytes`` type, different from ``str`` but supporting conversion to and from ``str`` via the ``decode`` and ``encode`` methods. The net result of these changes is a simpler and more intuitive type system. Unsurprising Arithmetic ----------------------- In common with other popular languages like Java, C and C++, Python 2.x implements 'floor division' for integers: .. sourcecode:: python >>> 1 / 2 0 This is an unwelcome surprise for most programming novices. In our experience of teaching Python 2.x (and before that, Java and C++), most students make mistakes with integer division more than once before they learn to cope with this counterintuitive behaviour. Python 3.0 changes ``int`` division so that it now returns a ``float`` value. The ``//`` operator can be used to obtain the old behaviour: .. sourcecode:: python >>> 1 / 2 0.5 >>> 1 // 2 0 For programmers whose expectations have been shaped by exposure to the C family of languages, this change has been the source of much anguish ever since it was first proposed in `PEP 238`_ back in 2001 (as can be seen in `comp.lang.python newsgroup discussion on the topic`_), but it removes what was clearly a significant hurdle for those new to programming. .. _PEP 238: http://www.python.org/dev/peps/pep-0238/ .. _comp.lang.python newsgroup discussion on the topic: http://groups.google.com/group/comp.lang.python/browse_thread/thread/fb904331950b1ea0/979ba69ca0884695?q=integer+division&lnk=ol& Cleaner Comparisons ------------------- Python 2.x allows comparison of incompatible types using ``<``, ``>``, etc: .. sourcecode:: pycon >>> 42 < 'hello' True Newcomers to programming obviously have difficulty understanding why this expression should evaluate to ``True`` -- or why it should evaluate at all, for that matter. Thankfully, in Python 3.0, such operations trigger an exception: .. sourcecode:: pycon >>> 42 < 'hello' Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: int() < str() A consequence of this is that it is no longer possible to sort lists that contain a mixture of incompatible types. A minor enhancement to comparisons is the removal of the long-deprecated ``<>`` as a 'not equal to' operator. In Python 2.x, this or the more usual ``!=`` could be used, but in Python 3.0 ``!=`` is the only operator available for this purpose. The ``<>`` operator is but one of a number of obsolete or little-used features that are being removed from the language [#]_. Greater Consistency in Console I/O ---------------------------------- Printing ~~~~~~~~ A long-standing design wart has been the asymmetry that exists in console I/O. In Python 2.x, input from the console is handled by **functions**, whereas output to the console is typically handled by a ``print`` **statement**: .. sourcecode:: python import math number = float(raw_input('Enter a value: ')) print 'Square-root of', number, 'is', math.sqrt(number) We have observed more than one student attempting to put brackets around the list of things to be printed in a program like this, presumably in the belief that they are bundling up the arguments to a function call. Unfortunately, this does not result in an error; instead, it creates and prints a tuple of objects, like so:: ('Square-root of', 2.0, 'is', 1.4142135623730951) Students unfamiliar with the concept of a tuple can easily form the impression that Python prints things in a rather odd way. Python 3.0 fixes this asymmetry by making ``print`` a built-in function: .. sourcecode:: python3 print('Square-root of', number, 'is', math.sqrt(number)) Making ``print`` a function also removes the ugly 'redirection' syntax for printing to ``stderr`` or a previously opened file, added in Python 2.0. Thus a line like .. sourcecode:: python print >> sys.stderr, 'Error! Number required' becomes, in Python 3.0, .. sourcecode:: python3 print('Error! Number required', file=sys.stderr) The new ``print`` function supports keyword arguments ``sep`` and ``end``, for specifying the strings used to separate arguments and terminate the printed line. (These strings default to a space and a newline, respectively.) This feature makes it possible to print multiple values on separate lines with one function call: .. sourcecode:: python3 print(x, y, z, sep='\n') Keyboard Input ~~~~~~~~~~~~~~ For a long time, Python has had *two* built-in functions that read from the console: ``input`` and ``raw_input``. ``input`` reads a string of characters and attempts to evaluate them, such that a sequence of digits yields an integer value, characters enclosed in quotes yield a string, etc. Unfortunately, this is less useful that it sounds. Consider the following simple Python 2.x program: .. sourcecode:: python # hello.py - a program that greets you name = input('Enter your name: ') print 'Hello,', name Here are two attempts to run the program:: $ python hello.py Enter your name: nick Traceback (most recent call last): File "hello.py", line 1, in name = input('Enter your name: ') File "", line 1, in NameError: name 'nick' is not defined :: $ python hello.py Enter your name: max Hello, In both cases, the student running the program has forgotten that string input should be enclosed in quotes, with the result that Python treats the inputs as names of objects in the global namespace. The first attempt fails because there is no object named ``nick``, but the second attempt succeeds because Python has a built-in function named ``max``. Both results are confusing to programming novices. As another example, consider how we might read a number. Code like .. sourcecode:: python number = input('Enter a value: ') will do the right thing for inputs such as ``42`` or ``7.5``, but will not prevent input of ``'hello'`` (which makes ``number`` a reference to a string object) or ``max`` (which, as we have already seen, makes it a reference to a built-in function). An obvious solution is to force conversion to the required numeric type, using the appropriate factory function: .. sourcecode:: python number = float(input('Enter a value: ')) But if we are doing this, why bother evaluating the input at all? Why not simply read characters from the keyboard and rely on the ``float`` function to convert them into a number? Both of the problems discussed above are solved in Python 2.x by using ``raw_input`` instead of ``input``. The ``raw_input`` function returns console input as a string object and allows the programmer to decide exactly how this string should be handled. The string can be left alone in cases where text is expected (as in ``hello.py``) or it can be converted explicitly to the required type (as in our second example): .. sourcecode:: python number = float(raw_input('Enter a value: ')) We have seen cases where students have used ``input`` in their Python 2.x programs, having failed to recognise that ``raw_input`` is a safer, less confusing alternative. Python 3.0 prevents such confusion by providing a single function named ``input``, equivalent to the ``raw_input`` function of earlier versions. Evaluation of the input string is still possible, but it must be done explicitly, with code like ``eval(input())``. A Single Object Model --------------------- Before Python 2.2 arrived in December 2001, there were significant differences between Python's built-in types and user-defined classes, such that it was impossible to create a user-defined subclass of a built-in type. Version 2.2 went a long way towards healing the class/type split by introducing **new-style classes** alongside the existing **classic classes**, and by turning most of the built-in types into these new-style classes. Since version 2.2, it has therefore been possible to begin class definitions in two ways: .. sourcecode:: python # Old-style class Foo: ... # New-style class Foo(object): ... Students sometimes fail to recognise that these examples are **two different kinds of class**. This can be a particular problem for those coming to Python from Java, where ``class Foo`` and ``class Foo extends Object`` are entirely equivalent ways of beginning a class definition. Unfortunately, many of the books and online tutorials on Python published since the release of version 2.2 have done little to clarify the distinction between old- and new-style classes or provide adequate guidance to novice programmers on which type of class should be used. Some [#]_ [#]_ [#]_ have ignored new-style classes altogether and others [#]_ have gone so far as to acknowledge their existence whilst using old-style classes almost exclusively in example code. In some cases [#]_ there is a balanced discussion of the two class types, whereas a few titles [#]_ [#]_ have encouraged a more modern approach by concentrating almost exclusively on new-style classes. This lack of a consistency can be very confusing for students. Python 3.0 solves the problem by removing old-style classes entirely, leaving us with a single object model based on new-style classes. In version 3.0, class definitions start off much as they did before, but with the essential difference that ``object`` is *implicitly* a superclass (as is the case in Java and C#). Thus, there is no longer any difference between the two examples shown below. .. sourcecode:: python3 # Python 3.0 class, inheriting implicitly from object class Foo: ... # Python 3.0 class, explicit syntax class Foo(object): ... Barriers to Adoption ==================== We have suggested above that there is a strong case for switching to Python 3.0 for teaching, on the grounds that it is a cleaner and more consistent language than version 2.5. However, there are three obvious barriers to adoption: 1. Suitable, 3.0-based textbooks might not be available. 2. Most example code will need to be altered in some way to be compatible with Python 3.0, even if all that is required in most cases is translation of a ``print`` statement into a function call. 3. Where there are dependencies on third-party modules or packages, it will be necessary to wait for these packages to be ported to 3.0. Problem 1 will resolve itself eventually; in the meantime, a short-term solution might be to update existing, free material such as Downey's `How to Think Like a (Python) Programmer`_. Problem 2 is minimised by tools provided with Python 3.0 to assist with conversion, `as discussed below`_. Problem 3 is more troubling. In teaching programming to first-year students at Leeds, we have used Pygame_, wxPython_ and Django_ to stimulate and motivate students with examples of how Python can be used for game, GUI and web programming. If these packages are not ported to version 3.0 soon after its release, then we will have to use 3.0-compatible alternatives (if they exist), abandon these examples entirely or (reluctantly) delay adoption of version 3.0. .. _How to Think Like a (Python) Programmer: http://www.greenteapress.com/thinkpython/ .. _as discussed below: `Converting Code to Python 3.0`_ .. _Pygame: http://pygame.org/ .. _wxPython: http://wxpython.org/ .. _Django: http://www.djangoproject.com/ Converting Code to Python 3.0 ============================= The Python 3.0 distribution comes with a refactoring tool called **2to3**, intended to assist with the translation of Python 2.x code to Python 3.0. The tool operates on ``stdin``, individual files or an entire directory tree. It writes a unified diff patch for each ``.py`` file to ``stdout``, and a summary of which files needed changes to ``stderr``. With the ``-w`` command line option, it will create a back-up of each file and then apply the patch to the file. An an experiment, we tried running the 2to3 tool provided in the third alpha release of Python 3.0 on the collection of example programs used in our first-year programming lectures [#]_, using the ``-w`` option to apply the patches. The tool took 47.2 seconds to process 147 ``.py`` files [#]_ and changed 77 of them (52% of the total). The table below gives a breakdown of the changes made. ========================= ===== Feature Files ========================= ===== ``raw_input`` only 1 ``print`` only 46 ``raw_input`` & ``print`` 15 Other 15 ========================= ===== In 80% of cases, the changes involved console I/O only and no manual intervention was required to produce satisfactory code. In 5% of cases, 2to3 produced code that ran correctly but was redundant in some way. An example is this: .. sourcecode:: python def factorial(n): """Returns the factorial of a non-negative integer n.""" if not isinstance(n, (int, long)): raise TypeError('argument must be an integer') elif n < 0: raise ValueError('argument cannot be negative') elif n <= 1: return 1 else: return n*factorial(n-1) 2to3 performs a simple substitution, changing the first line of the function body to .. sourcecode:: python if not isinstance(n, (int, int)): rather than .. sourcecode:: python if not isinstance(n, int): Another example is this: .. sourcecode:: python class Rainfall(object): """A set of monthly rainfall measurements.""" ... def maximum(self): """Returns the maximum rainfall in this dataset.""" return max(self.data.values()) def total(self): """Returns the total rainfall over the entire dataset.""" return sum(self.data.values()) 2to3 changes the ``return`` statement of the ``maximum`` method to .. sourcecode:: python return max(list(self.data.values())) It does this because, in Python 3.0, the ``values`` method of a ``dict`` object returns a lightweight, read-only **view** of that dictionary's values, rather than a list. However, in this particular context, the change is unnecessary; the ``max`` function requires an iterable argument and views satisfy this requirement. Curiously, 2to3 does not make a similar change to the return statement of the ``total`` method. In three of our ``.py`` files (2% of the entire set), 2to3 failed to make a change that was required for code to run correctly. One of these cases involved integer division: .. sourcecode:: python def median(numbers): """Returns the median of a sequence of numbers.""" size = len(numbers) data = sorted(numbers) n = size / 2 if size % 2 == 0: return 0.5 * (data[n-1] + data[n]) else: return data[n] The division in ``n = size / 2`` causes the problem. In Python 2.x, ``n`` is guaranteed to be an ``int`` because ``size`` is always an ``int``; in Python 3.0, however, ``n`` is guaranteed to be a ``float``, and ``float`` values cannot be used to index a list. The fix is simply to do floor division with ``//``, but 2to3 is unable to deduce the need for this. Another problem missed by 2to3 involved the use of ``range`` in the following lottery simulation program: .. sourcecode:: python3 import random numbers = range(1, 50) chosen = [] while len(chosen) < 6: number = random.choice(numbers) numbers.remove(number) chosen.append(number) chosen.sort() print("This week's numbers are", chosen) print("The bonus ball is", random.choice(numbers)) In Python 2.x, ``numbers`` is a list, so the attempt to invoke its ``remove`` method inside the ``while`` loop succeeds; in Python 3.0, however, ``numbers`` is an iterable object that doesn't have a ``remove`` method, so we get an ``AttributeError`` exception. We can fix this easily enough, by initialising ``numbers`` like so: .. sourcecode:: python3 numbers = list(range(1, 50)) Conclusions =========== We have argued that there are good reasons for switching to Python 3.0 for the teaching of programming, and that there are also reasons to be cautious about making the switch immediately. Existing code examples can be converted automatically for the most part, although manual intervention will be necessary to check the correctness of the resulting Python 3.0 code. Of more serious concern to early adopters will be the initial shortage of suitable textbooks and delays in porting important third-party libraries and frameworks to version 3.0 of the language. These obstacles are temporary only; once they have disappeared, Python 3.0 will have a bright future for the teaching of programming. ---- .. [#] Python 2.2 began the process of unifying ``int`` and ``long`` by introducing automatic conversion from ``int`` to ``long`` where this would prevent integer overflow. .. [#] Other features being removed include: backticks (use ``repr`` instead); the ``iterkeys``, ``itervalues`` and ``iteritems`` methods of dictionaries (use ``keys``, ``values`` and ``items`` instead); built-ins ``apply``, ``callable``, ``coerce``, ``execfile``, ``file``, ``reduce`` and ``reload``. .. [#] P Norton, A Samuel, D Aitel, E Foster-Johnson, L Richardson, J Diamond, A Parker & M Roberts, *Beginning Python*, Wrox Press, 2005 .. [#] Mark Lutz, *Programming Python* (3rd ed.), O'Reilly, 2006 .. [#] A B Downey, *How to Think Like a (Python) Programmer*, http://www.greenteapress.com/thinkpython/ (last visited 2008-03-25) .. [#] S Mount, J Shuttleworth & R Winder, *Python for Rookies*, Thomson Learning, 2008 .. [#] Magnus Lie Hetland, *Beginning Python: From Novice to Professional*, Apress, 2005 .. [#] Wesley Chun, *Core Python Programming* (2nd ed.), Prentice Hall, 2007 .. [#] Matt Telles, *Python Power*, Thomson Learning, 2007 .. [#] A small amount of code was omitted -- e.g., our Django examples. .. [#] This was on an 2 GHz AMD Athlon 64 machine with 1 GB of RAM, running Ubuntu 7.10, using Python 2.5.1 compiled with GCC 4.1.3.