History Articles | History Article: December 2009

Thursday, December 31, 2009

This is Time Travel

More Google - This is Time Travel

Want to see actual newspapers from the past, go here to understand it and here to search.

Google Maps Follow Protests in Iran

Want to add Google Maps to your class and are teaching World History, then is a great way to do this. This site has been following the protests in Iran.

(Photo from Wired.com)

Google and Texting and other uses for cell phones in class

Google and Texting and other uses for cell phones in class

If you have "Google" as one of the address in your phone, you can text a question and send it to that address and get a response. are a bunch of other ways to use texting (including several articles). I learned recently that our school district, while banning cell phones, actually leaves it up to the principal. You might want to check out if your school is the same. That is how I got permission for just social studies teachers to use it. The link above also shows one how to use polleverywhere.com which is a way to quickly go over a few multiple choice questions, have kids text the answers and instantly see a chart for the percentage of kids who have answered what question. Finally if you go to the search engine on this site and type in "cell phones," you will find many other suggestions for usage from this summer's postings.

Wednesday, December 30, 2009

Review By Google Wave

2009 in Review as seen by Google Wave

Above is the year in review by Google Wave. If you haven't seen Google Wave, it is real time talking (both written or via up to six video links). this is from a Twitter feed from GiseldaSantos.

Google Docs

Quizzes on Google Docs

Perhaps I am getting too far afield with all the "Google stuff," but above is a video that tells you how you can create a quiz with all types of questions in Google Docs and then allow your students to take it.

Kids Making Their Own Cartoons

Kids Making Their Own Cartoons
I received this from one of my normal followers (feel free to leave a message with your tips or to e-mail me at kenhalla@gmail.com) if you have any. I always have my students create cartoons as I find it a good way to memorize something without having to actually sit down and say something over and over until it is in one's head. With this site, one can actually make digital cartoons much as we have drawn them in the past.

Tuesday, December 29, 2009

A Brief Timeline of Python

The development of Python occurred at a time when many other dynamic (and open-source) programming languages such as Tcl, Perl, and (much later) Ruby were also being actively developed and gaining popularity. To help put Python in its proper historical perspective, the following list shows the release history of Python. The earliest dates are approximate as I didn't consistently record all events:

Release Date	Version
December, 1989	Implementation started
1990	Internal releases at CWI
February 20, 1991	0.9.0 (released to alt.sources)
February, 1991	0.9.1
Autumn, 1991	0.9.2
December 24, 1991	0.9.4
January 2, 1992	0.9.5 (Macintosh only)
April 6, 1992	0.9.6
Unknown, 1992	0.9.7beta
January 9, 1993	0.9.8
July 29, 1993	0.9.9
January 26, 1994	1.0.0
February 15, 1994	1.0.2
May 4, 1994	1.0.3
July 14, 1994	1.0.4
October 11, 1994	1.1
November 10, 1994	1.1.1
April 13, 1995	1.2
October 13, 1995	1.3
October 25, 1996	1.4
January 3, 1998	1.5
October 31, 1998	1.5.1
April 13, 1999	1.5.2
September 5, 2000	1.6
October 16, 2000	2.0
April 17, 2001	2.1
December 21, 2001	2.2
July 29, 2003	2.3
November 30, 2004	2.4
September 16, 2006	2.5
October 1, 2008	2.6
December 3, 2008	3.0

I've added hyperlinks to the releases that are still being advertised on python.org at this time. Note that many releases were followed by several micro-releases, e.g. 2.0.1; I haven't bothered to include these in the table as otherwise it would become too long. Source tarball of very old releases are also still accessible, here: http://www.python.org/ftp/python/src/. Various ancient binary releases and other historical artefacts can still be found by going one level up from there.

Python's Design Philosophy

Later blog entries will dive into the gory details of Python's history. However, before I do that, I would like to elaborate on the philosophical guidelines that helped me make decisions while designing and implementing Python.

First of all, Python was originally conceived as a one-person “skunkworks” project – there was no official budget, and I wanted results quickly, in part so that I could convince management to support the project (in which I was fairly successful). This led to a number of timesaving rules:

Borrow ideas from elsewhere whenever it makes sense.
“Things should be as simple as possible, but no simpler.” (Einstein)
Do one thing well (The "UNIX philosophy").
Don’t fret too much about performance--plan to optimize later when needed.
Don’t fight the environment and go with the flow.
Don’t try for perfection because “good enough” is often just that.
(Hence) it’s okay to cut corners sometimes, especially if you can do it right later.

Other principles weren’t intended as timesavers. Sometimes they were quite the opposite:

The Python implementation should not be tied to a particular platform. It’s okay if some functionality is not always available, but the core should work everywhere.
Don’t bother users with details that the machine can handle (I didn’t always follow this rule and some of the of the disastrous consequences are described in later sections).
Support and encourage platform-independent user code, but don’t cut off access to platform capabilities or properties (This is in sharp contrast to Java.)
A large complex system should have multiple levels of extensibility. This maximizes the opportunities for users, sophisticated or not, to help themselves.
Errors should not be fatal. That is, user code should be able to recover from error conditions as long as the virtual machine is still functional.
At the same time, errors should not pass silently (These last two items naturally led to the decision to use exceptions throughout the implementation.)
A bug in the user’s Python code should not be allowed to lead to undefined behavior of the Python interpreter; a core dump is never the user’s fault.

Finally, I had various ideas about good programming language design, which were largely imprinted on me by the ABC group where I had my first real experience with language implementation and design. These ideas are the hardest to put into words, as they mostly revolved around subjective concepts like elegance, simplicity and readability. Although I will discuss more of ABC's influence on Python a little later, I’d like to mention one readability rule specifically: punctuation characters should be used conservatively, in line with their common use in written English or high-school algebra. Exceptions are made when a particular notation is a long-standing tradition in programming languages, such as “x*y” for multiplication, “a[i]” for array subscription, or “x.foo” for attribute selection, but Python does not use “$” to indicate variables, nor “!” to indicate operations with side effects. Tim Peters, a long time Python user who eventually became its most prolific and tenacious core developer, attempted to capture my unstated design principles in what he calls the “Zen of Python.” I quote it here in its entirety:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Although my experience with ABC greatly influenced Python, the ABC group had a few design principles that were radically different from Python’s. In many ways, Python is a conscious departure from these:

The ABC group strived for perfection. For example, they used tree-based data structure algorithms that were proven to be optimal for asymptotically large collections (but were not so great for small collections).
The ABC group wanted to isolate the user, as completely as possible, from the “big, bad world of computers” out there. Not only should there be no limit on the range of numbers, the length of strings, or the size of collections (other than the total memory available), but users should also not be required to deal with files, disks, “saving”, or other programs. ABC should be the only tool they ever needed. This desire also caused the ABC group to create a complete integrated editing environment, unique to ABC (There was an escape possible from ABC’s environment, for sure, but it was mostly an afterthought, and not accessible directly from the language.)
The ABC group assumed that the users had no prior computer experience (or were willing to forget it). Thus, alternative terminology was introduced that was considered more “newbie-friendly” than standard programming terms. For example, procedures were called “how-tos” and variables “locations”.
The ABC group designed ABC without an evolutionary path in mind, and without expecting user participation in the design of the language. ABC was created as a closed system, as flawless as its designers could make it. Users were not encouraged to “look under the hood”. Although there was talk of opening up parts of the implementation to advanced users in later stages of the project, this was never realized.

In many ways, the design philosophy I used when creating Python is probably one of the main reasons for its ultimate success. Rather than striving for perfection, early adopters found that Python worked "well enough" for their purposes. As the user-base grew, suggestions for improvement were gradually incorporated into the language. As we will seen in later sections, many of these improvements have involved substantial changes and reworking of core parts of the language. Even today, Python continues to evolve.

Introduction and Overview

Introduction

Python is currently one of the most popular dynamic programming languages, along with Perl, Tcl, PHP, and newcomer Ruby. Although it is often viewed as a "scripting" language, it is really a general purpose programming language along the lines of Lisp or Smalltalk (as are the others, by the way). Today, Python is used for everything from throw-away scripts to large scalable web servers that provide uninterrupted service 24x7. It is used for GUI and database programming, client- and server-side web programming, and application testing. It is used by scientists writing applications for the world's fastest supercomputers and by children first learning to program.

In this blog, I will shine the spotlight on Python's history. In particular, how Python was developed, major influences in its design, mistakes made, lessons learned, and future directions for the language.

Acknowledgment: I am indebted to Dave Beazley for many of the better sentences in this blog. (For more on the origins of this blog, see my other blog.)

A Bird's Eye View of Python

When one is first exposed to Python, they are often struck by way that Python code looks, at least on the surface, similar to code written in other conventional programming languages such as C or Pascal. This is no accident---the syntax of Python borrows heavily from C. For instance, many of Python's keywords (if, else, while, for, etc.) are the same as in C, Python identifiers have the same naming rules as C, and most of the standard operators have the same meaning as C. Of course, Python is obviously not C and one major area where it differs is that instead of using braces for statement grouping, it uses indentation. For example, instead of writing statements in C like this

if (a < b) {
max = b;
} else {
max = a;
}

Python just dispenses with the braces altogether (along with the trailing semicolons for good measure) and uses the following structure

if a < b:
max = b
else:
max = a

The other major area where Python differs from C-like languages is in its use of dynamic typing. In C, variables must always be explicitly declared and given a specific type such as int or double. This information is then used to perform static compile-time checks of the program as well as for allocating memory locations used for storing the variable’s value. In Python, variables are simply names that refer to objects. Variables do not need to be declared before they are assigned and they can even change type in the middle of a program. Like other dynamic languages, all type-checking is performed at run-time by an interpreter instead of during a separate compilation step.

Python’s primitive built-in data types include Booleans, numbers (machine integers, arbitrary-precision integers, and real and complex floating point numbers), and strings (8-bit and Unicode). These are all immutable types, meaning that values are represented by objects that cannot be modified after their creation. Compound built-in data types include tuples (immutable arrays), lists (resizable arrays) and dictionaries (hash tables).

For organizing programs, Python supports packages (groups of modules and/or packages), modules (related code grouped together in a single source file), classes, methods and functions. For flow control, it provides if/else, while, and a high-level for statement that loops over any “iterable” object. For error handling, Python uses (non-resumable) exceptions. A raise statement throws an exception, and try/except/finally statements specify exception handlers. Built-in operations throw exceptions when error conditions are encountered.

In Python, all objects that can be named are said to be "first class." This means that functions, classes, methods, modules, and all other named objects can be freely passed around, inspected, and placed in various data structures (e.g., lists or dictionaries) at run-time. And speaking of objects, Python also has full support for object-oriented programming including user-defined classes, inheritance, and run-time binding of methods.

Python has an extensive standard library, which is one of the main reasons for its popularity. The standard library has more than 100 modules and is always evolving. Some of these modules include regular expression matching, standard mathematical functions, threads, operating systems interfaces, network programming, standard internet protocols (HTTP,FTP, SMTP, etc.), email handling, XML processing, HTML parsing, and a GUI toolkit (Tcl/Tk).

In addition, there is a very large supply of third-party modules and packages, most of which are also open source. Here one finds web frameworks (too many to list!), more GUI toolkits, efficient numerical libraries (including wrappers for many popular Fortran packages), interfaces to relational databases (Oracle, MySQL, and others), SWIG (a tool for making arbitrary C++ libraries available as Python modules), and much more.

A major appeal of Python (and other dynamic programming languages for that matter) is that seemingly complicated tasks can often be expressed with very little code. As an example, here is a simple Python script that fetches a web page, scans it looking for URL references, and prints the first 10 of those.

# Scan the web looking for references

import re
import urllib

regex = re.compile(r'href="([^"]+)"')

def matcher(url, max=10):
"Print the first several URL references in a given url."
data = urllib.urlopen(url).read()
hits = regex.findall(data)
for hit in hits[:max]:
print urllib.basejoin(url, hit)

matcher("http://python.org")

This program can easily be modified to make a web crawler, and indeed Scott Hassan has told me that he wrote Google’s first web crawler in Python. Today, Google employs millions of lines of Python code to manage many aspects of its operations, from build automation to ad management (Disclaimer: I am currently a Google employee.)

Underneath the covers, Python is typically implemented using a combination of a bytecode compiler and interpreter. Compilation is implicitly performed as modules are loaded, and several language primitives require the compiler to be available at run-time. Although Python’s de-facto standard implementation is written in C, and available for every imaginable hardware/software platform, several other implementations have become popular. Jython is a version that runs on the JVM and has seamless Java integration. IronPython is a version for the Microsoft .NET platform that has similar integration with other languages running on .NET. PyPy is an optimizing Python compiler/interpreter written in Python (still a research project, being undertaken with EU funding). There’s also Stackless Python, a variant of the C implementation that reduces reliance on the C stack for function/method calls, to allow co-routines, continuations, and microthreads.

Monday, December 28, 2009

Personal History - part 1, CWI

Python’s early development started at a research institute in Amsterdam called CWI, which is a Dutch acronym for a phrase that translates into English as Centre for Mathematics and Computer Science. CWI is an interesting place; funded by the Dutch government’s Department of Education and other research grants, it conducts academic-level research into computer science and mathematics. At any given time there are plenty of Ph.D. students wandering about and old-timers in the profession may still remember its original name, the Mathematical Centre. Under this name, it was perhaps most famous for the invention of Algol 68.

I started working at CWI in late 1982, fresh out of university, as a programmer in the ABC group led by Lambert Meertens and Steven Pemberton. After 4 or 5 years the ABC project was terminated due to the lack of obvious success and I moved to CWI’s Amoeba group led by Sape Mullender. Amoeba was a micro-kernel-based distributed system being jointly developed by CWI and the Vrije Universiteit of Amsterdam, under leadership of Andrew Tanenbaum. In 1991 Sape left CWI for a professorship at the University of Twente and I ended up in the newly formed CWI multimedia group led by Dick Bulterman.

Python is a direct product of my experience at CWI. As I explain later, ABC gave me the key inspiration for Python, Amoeba the immediate motivation, and the multimedia group fostered its growth. However, so far as I know, no funds at CWI were ever officially earmarked for its development. Instead, it merely evolved as an important tool for use in both the Amoeba and multimedia groups.

My original motivation for creating Python was the perceived need for a higher level language in the Amoeba project. I realized that the development of system administration utilities in C was taking too long. Moreover, doing these in the Bourne shell wouldn’t work for a variety of reasons. The most important one was that as a distributed micro-kernel system with a radically new design, Amoeba’s primitive operations were very different (and finer-grain) than the traditional primitive operations available in the Bourne shell. So there was a need for a language that would “bridge the gap between C and the shell.” For a long time, this was Python’s main catchphrase.

At this point, you might ask "why not port an existing language?" In my view, there weren’t a lot of suitable languages around at that time. I was familiar with Perl 3, but it was even more tied to Unix than the Bourne shell. I also didn’t like Perl’s syntax--my tastes in programming language syntax were strongly influenced by languages like Algol 60, Pascal, Algol 68 (all of which I had learned early on), and last but not least, ABC, on which I’d spent four years of my life. So, I decided to design a language of my own which would borrow everything I liked from ABC while at the same time fixing all its problems (as I perceived them).

The first problem I decided to fix was the name! As it happened, the ABC team had some trouble picking a name for its language. The original name for the language, B, had to be abandoned because of confusion with another language named B, that was older and better known. In any case, B was meant as a working title only (the joke was that B was the name of the variable containing the name of the language--hence the italics). The team had a public contest to come up with a new name, but none of the submissions made the cut, and in the end, the internal back up candidate prevailed. The name was meant to convey the idea that the language made programming “as simple as ABC”, but it never convinced me all that much.

So, rather than over-analyzing the naming problem, I decided to under-analyze it. I picked the first thing that came to mind, which happened to be Monty Python’s Flying Circus, one of my favorite comedy troupes. The reference felt suitably irreverent for what was essentially a “skunkworks project”. The word “Python” was also catchy, a bit edgy, and at the same time, it fit in the tradition of naming languages after famous people, like Pascal, Ada, and Eiffel. The Monty Python team may not be famous for their advancement of science or technology, but they are certainly a geek favorite. It also fit in with a tradition in the CWI Amoeba group to name programs after TV shows.

For many years I resisted attempts to associate the language with snakes. I finally gave up when O’Reilly wanted to put a snake on the front of their first Python book "Programming Python". It was an O’Reilly tradition to use animal pictures, and if it had to be an animal, it might as well be a snake.

With the naming issue settled, I started working on Python in late December 1989, and had a working version in the first months of 1990. I didn’t keep notes, but I remember vividly that the first piece of code I wrote for Python’s implementation was a simple LL(1) parser generator I called “pgen." This parser generator is still part of the Python source distribution and probably the least changed of all the code. This early version of Python was used by a number of people at CWI, mostly, but not exclusively in the Amoeba group during 1990. Key developers besides myself were my officemates, programmers Sjoerd Mullender (Sape’s younger brother) and Jack Jansen (who remained one of the lead developers of the Macintosh port for many years after I left CWI).

On February 20, 1991, I first released Python to the world in the alt.sources newsgroup (as 21 uuencoded parts that had to be joined together and uudecoded to form a compressed tar file). This version was labeled 0.9.0, and released under a license that was an almost verbatim copy of the MIT license used by the X11 project at the time, substituting “Stichting Mathematisch Centrum”, CWI’s parent organization, as the responsible legal entity. So, like almost everything I’ve written, Python was open source before the term was even invented by Eric Raymond and Bruce Perens in late 1997.

There was immediately a lot of feedback and with this encouragement I kept a steady stream of releases coming for the next few years. I started to use CVS to track changes and to allow easier sharing of coding responsibilities with Sjoerd and Jack (Coincidentally, CVS was originally developed as a set of shell scripts by Dick Grune, who was an early member of the ABC group). I wrote a FAQ, which was regularly posted to some newsgroup, as was customary for FAQs in those days before the web, started a mailing list, and in March 1993 the comp.lang.python newsgroup was created with my encouragement but without my direct involvement. The newsgroup and mailing list were joined via a bidirectional gateway that still exists, although it is now implemented as a feature of mailman – the dominant open source mailing list manager, itself written in Python.

In the summer of 1994, the newsgroup was buzzing with a thread titled “If Guido was hit by a bus?” about the dependency of the growing Python community on my personal contributions. This culminated in an invitation from Michael McLay for me to spend two months as a guest researcher at NIST, the US National Institute for Standards and Technology, formerly the National Bureau of Standards, in Gaithersburg, Maryland. Michael had a number of “customers” at NIST who were interested in using Python for a variety of standards-related projects and the budget for my stay there was motivated by the need to help them improve their Python skills, as well as possibly improving Python for their needs.

The first Python workshop was held while I was there in November 1994, with NIST programmer Ken Manheimer providing important assistance and encouragement. Of the approximately 20 attendees, about half are still active participants in the Python community and a few have become major open source project leaders themselves (Jim Fulton of Zope and Barry Warsaw of GNU mailman). With NIST’s support I also gave a keynote for about 400 people at the Usenix Little Languages conference in Santa Fe, organized by Tom Christiansen, an open-minded Perl advocate who introduced me to Perl creator Larry Wall and Tcl/Tk author John Ousterhout.

Next installment: how I got a job in the US...

Personal History - part 2, CNRI and beyond

The Python workshop (see previous posting) resulted in a job offer to come work on mobile agents at CNRI (the Corporation for National Research Initiatives). CNRI is a non-profit research lab in Reston, Virginia. I joined in April 1995. CNRI’s director, Bob Kahn, was the first to point out to me how much Python has in common with Lisp, despite being completely different at a superficial (syntactic) level. Python work at CNRI was funded indirectly by a DARPA grant for mobile agent research. Although there was DARPA support for projects that used Python, there was not much direct support for language development itself.

At CNRI, I led and helped hire a small team of developers to build a mobile agent system in pure Python. The initial team members were Roger Masse and Barry Warsaw who were bitten by the Python bug at the Python workshop at NIST. In addition, we hired Python community members Ken Manheimer and Fred Drake. Jeremy Hylton, an MIT graduate originally hired to work on text retrieval, also joined the team. The team was initially managed by Ted Strollo and later on by Al Vezza.

This team helped me create and maintain additional Python community infrastructure such as the python.org website, the CVS server, and the mailing lists for various Python Special Interest Groups. Python releases 1.3 through 1.6 came out of CNRI. For many years Python 1.5.2 was the most popular and most stable version.

GNU mailman was also born here: we originally used a Perl tool called Majordomo, but Ken Manheimer found it unmaintainable and looked for a Python solution. He found out about something written in Python by John Viega and took over maintenance. When Ken left CNRI for Digital Creations, Barry Warsaw took over, and convinced the Free Software Foundation to adopt it as its official mailing list tool. Hence Barry licensed it under the GPL (GNU Public License).

The Python workshops continued, at first twice a year, but due to the growth and increased logistical efforts they soon evolved into yearly events. These were first run by whoever wanted to host them, like NIST (the first one), USGS (the second and third one) and LLNL (the fourth one, and the start of the yearly series). Eventually CNRI took over the organization, and later (together with the WWW and IETF conferences) this was spun off as a commercial effort, Fortec. Attendance quickly rose to several hundreds. When Fortec faded away a while after I left CNRI, the International Python Conference was folded into O'Reilly's Open Source Conference (OSCON), but at the same time the Python Software Foundation (see below) started a new series of grassroots conferences named PyCon.

We also created the first (loose) organization around Python at CNRI. In response to efforts by Mike McLay and Paul Everitt to create a "Python Foundation", which ended up in the quicksand of bylaw drafting, Bob Kahn offered to create the "Python Software Activity", which would not be an independent legal entity but simply a group of people working under CNRI's legal (non-profit) umbrella. The PSA was successful in rallying the energy of a large group of committed Python users, but its lack of independence limited its effectiveness.

CNRI also used DARPA money to fund the development of JPython (later shortened to Jython), a Python implementation in and for Java. Jim Hugunin initially created JPython while doing graduate work at MIT. He then convinced CNRI to hire him to complete the work (or perhaps CNRI convinced Jim to join -- it happened while I was on vacation). When Jim left CNRI less than two years later to join the AspectJ project at Xerox PARC, Barry Warsaw continued the JPython development. (Much later, Jim would also author IronPython, the Python port to Microsoft's .NET. Jim also wrote the first version of Numeric Python.)

Other projects at CNRI also started to use Python. Several new core Python developers came out of this, in particular Andrew Kuchling, Neil Schemenauer, and Greg Ward, who worked for the MEMS Exchange project. (Andrew had contributed to Python even before joining CNRI; his first major project was the Python Cryptography Toolkit, a third party library that made many fundamental cryptological algorithms available to Python users.)

On the wings of Python's success, CNRI tried to come up with a model to fund Python development more directly than via DARPA research grants. We created the Python Consortium, modeled after the X Consortium, with a minimum entrance fee of $20,000. However, apart from one group at Hewlett-Packard, we didn't get much traction, and eventually the consortium died of anemia. Another attempt to find funding was Computer Programming for Everybody (CP4E), which received some DARPA funding. However, the funding wasn't enough for the whole team, and it turned out that there was a whole old-boys network involved in getting actually most of the money spread over several years. That was not something I enjoyed, and I started looking for other options.

Eventually, in early 2000, the dot-com boom, which hadn’t quite collapsed yet, convinced me and three other members of the CNRI Python team (Barry Warsaw, Jeremy Hylton, and Fred Drake) to join BeOpen.com, a California startup that was recruiting open source developers. Tim Peters, a key Python community member, also joined us.

In anticipation of the transition to BeOpen.com, a difficult question was the future ownership of Python. CNRI insisted on changing the license and requested that we release Python 1.6 with this new license. The old license used while I was still at CWI had been a version of the MIT license. The releases previously made at CNRI used a slightly modified version of that license, with basically one sentence added where CNRI disclaimed most responsibilities. The 1.6 license however was a long wordy piece of lawyerese crafted by CNRI's lawyers.

We had several long wrestling discussions with Richard Stallman and Eben Moglen of the Free Software Foundation about some parts of this new license. They feared it would be incompatible with the GPL, and hence threaten the viability of GNU mailman, which had by now become an essential tool for the FSF. With the help of Eric Raymond, changes to the CNRI Python license were made that satisfied both the FSF and CNRI, but the resulting language is not easy to understand. The only good thing I can say about it is that (again thanks to Eric Raymond's help) it has the seal of approval of the Open Source Initiative as a genuine open source license. Only slight modifications were made to the text of the license to reflect the two successive changes of ownership, first BeOpen.com and then the Python Software Foundation, but in essence the handiwork of CNRI's lawyers still stands.

Like so many startups at the time, the BeOpen.com business plan failed rather spectacularly. It left behind a large debt, some serious doubts about the role played by some of the company's officers, and some very disillusioned developers besides my own team.

Luckily year my team, by now known as PythonLabs, was pretty hot, and we were hired as a unit by Digital Creations, one of the first companies to use Python. (Ken Manheimer had preceded us there a few years before.) Digital Creations soon renamed itself Zope Corporation after its main open source product, the web content management system Zope. Zope’s founders Paul Everitt and Rob Page had attended the very first Python workshop at NIST in 1994, as did its CTO, Jim Fulton.

History could easily have gone very differently: besides Digital Creations, we were also considering offers from VA Linux and ActiveState. VA Linux was then a rising star on the stock market, but eventually its stock price (which had made Eric Raymond a multi-millionaire on paper) collapsed rather dramatically. Looking back I think ActiveState would not have been a bad choice, despite the controversial personality of its founder Dick Hardt, if it hadn't been located in Canada.

In 2001 we created the Python Software Foundation, a non-profit organization, whose initial members were the main contributing Python developers at that time. Eric Raymond was one of the founding members. I'll have to write more about this period another time.

Microsoft Ships Python Code... in 1996

My thanks go to Guido for allowing me to share my own history of Python!

I'll save my introduction to Python for another post, but the end result was its introduction into a startup that I co-founded in 1991 with several people. We were working on a large client/server system to handle Business-to-Consumer electronic shopping. Custom TCP protocols operating over the old X.25 network, and all that. Old school.

In 1995, we realized, contrary to our earlier beliefs, that more consumers actually were on the Internet, and that we needed a system for our customers (the vendors) to reach Internet-based consumers. I was tasked to figure out our approach, and selected Python as my prototyping tool.

Our first problem was moving to an entirely browser-based solution. Our custom client was no longer viable, so we needed a new shopping experience for the consumer, and server infrastructure to support that. At that time, talking to a web browser meant writing CGI scripts for the Apache and Netscape HTTP servers. Using CGI, I connected to our existing server backend to process orders, maintain the shopping basket, and to fetch product information. These CGI scripts produced plain, vanilla HTML (no AJAX in 1995!).

This approach was less-than-ideal since each request took time to spin up a new CGI process. The responsiveness was very poor. Then, in December 1995, while attending the Python Workshop in Washington, DC, I was introduced to some Apache and Netscape modules (from Digital Creations, who are best known for Zope) which ran persistently within the server process. These modules used an RPC system called ILU to communicate with backend, long-running processes. With this system in place, the CGI forking overhead disappeared and the shopping experience was now quite enjoyable! We started to turn the prototype into real code. The further we went with it, the better it looked and more people jumped onto the project. Development moved very fast over the next few months (thanks Python!).

In January 1996, Microsoft knocked on our door. Their internal effort at creating an electronic commerce system was floundering, and they needed people that knew the industry (we'd been doing electronic commerce for several years by that point) and somebody who was nimble. We continued to develop the software during the spring while negotiations occurred, and then the acquisition finalized in June 1996.

Once we arrived at Microsoft with our small pile of Python code, we had to figure out how to ship the product on Windows NT. The team we joined had lots of Windows experience and built an IIS plugin to communicate over named pipes to the backend servers, which were NT Services with our Python server code embedded. With a mad sprint starting in July, we shipped Microsoft Merchant Server 1.0 in October, 1996.

And yes... if you looked under the covers, somewhat hidden, was a Python interpreter, some extension DLLs, and a bunch of .pyc files. Microsoft certainly didn't advertise that fact, but it was there if you knew were to look.

Sunday, December 27, 2009

Python's Use of Dynamic Typing

An important difference between ABC and Python is the general flavor of the type system. ABC is statically typed which meant that the ABC compiler analyzes the use of types in a program and decides whether they are used consistently. If not, the program is rejected and its execution cannot be started. Unlike most statically typed languages of its days, ABC used type inference (not unlike Haskell) instead of explicit type declarations as you find in languages such as in C. In contrast, Python is dynamically typed. The Python compiler is blissfully unaware of the types used in a program, and all type checking is done at run time.

Although this might seem like a large departure from ABC, it is not as different as you might imagine. Unlike other statically typed languages, ABC doesn't (didn't? it's pretty much purely historic now :-) exclusively rely on static type checking to keep the program from crashing, but has a run-time library that checks the argument types for all operations again each time they are executed. This was done in part as a sanity check for the compiler’s type-checking algorithms, which were not fully implemented in the initial prototype implementation of the language. The runtime library was also useful as a debugging aid since explicit run time type checking could produce nice error messages (aimed at the implementation team), instead of the core dumps that would ensue if the interpreter were to blindly go ahead with an operation without checking if the arguments make sense.

However, the most important reason why ABC has run time type checking in addition to static type checking is that it is interactive. In an interactive session, the user types ABC statements and definitions which are executed as soon as they are completed. In an interactive session, it is possible to create a variable as a number, delete it, and then to recreate it (in other words, create another variable with the same name) as a string. Inside a single procedure, it would be a static typing error to use the same variable name first as a number and then as a string, but it would not be reasonable to enforce such type checking across different statements entered in an interactive session, as the accidental creation of a variable named x as a number would forever forbid the creation of a variable x with a different type! So as a compromise, ABC uses dynamic type checking for global variables, but static type checking for local variables. To simplify the implementation, local variables are dynamically type checked as well.

Thus, it is only a small step from ABC’s implementation approach to type checking to Python’s approach--Python simply drops the compile-time type checking completely. This is entirely in line with Python’s “corner-cutting” philosophy as it’s a simplification of the implementation that does not affect the eventual safety, since all type errors are caught at run time before they can cause the Python interpreter to malfunction.

However, once you decide on dynamic typing, there is no way to go back. ABC’s built-in operations were carefully designed so that the type of the arguments could be deduced from the form of the operation. For example, from the expression “x^y” the compiler would deduce that variables x and y were strings, as well as the expression result. In Python, such deductions cannot generally be made. For example, the expression “x+y” could be a string concatenation, a numeric addition, or an overloaded operation on user-defined types.

Adding Support for User-defined Classes

Believe it or not, classes were added late during Python’s first year of development at CWI, though well before the first public release. However, to understand how classes were added, it first helps to know a little bit about how Python is implemented.

Python is implemented in C as a classic stack-based byte code interpreter or “virtual machine” along with a collection of primitive types also implemented in C. The underlying architecture uses “objects” throughout, but since C has no direct support for objects, they are implemented using structures and function pointers. The Python virtual machine defines several dozen standard operations that every object type may or must implement (for example, “get attribute”, “add” and “call”). An object type is then represented by a statically allocated structure containing a series of function pointers, one for each standard operation. These function pointers are typically initialized with references to static functions. However, some operations are optional, and the object type may leave the function pointer NULL if it chooses not to implement that operation. In this case, the virtual machine either generates a run-time error or, in some cases, provides a default implementation of the operation. The type structure also contains various data fields, one of which is a reference to a list of additional methods that are unique to this type, represented as an array of structures containing a string (the method name) and a function pointer (its implementation) each. Python’s unique approach to introspection comes from its ability to make the type structure itself available at run-time as an object like all others.

An important aspect of this implementation is that it is completely C-centric. In fact, all of the standard operations and methods are implemented by C functions. Originally the byte code interpreter only supported calling pure Python functions and functions or methods implemented in C. I believe my colleague Siebren van der Zee was the first to suggest that Python should allow class definitions similar to those in C++ so that objects could also be implemented in Python.

To implement user-defined objects, I settled on the simplest possible design; a scheme where objects were represented by a new kind of built-in object that stored a class reference pointing to a "class object" shared by all instances of the same class, and a dictionary, dubbed the "instance dictionary", that contained the instance variables.

In this implementation, the instance dictionary would contain the instance variables of each individual object whereas the class object would contain stuff shared between all instances of the same class--in particular, methods. In implementing class objects, I again chose the simplest possible design; the set of methods of a class were stored in a dictionary whose keys are the method names. This, I dubbed the class dictionary. To support inheritance, class objects would additionally store a reference to the class objects corresponding to the base classes. At the time, I was fairly naïve about classes, but I knew about multiple inheritance, which had recently been added to C++. I decided that as long as I was going to support inheritance, I might as well support a simple-minded version of multiple inheritance. Thus, every class object could have one or more base classes.

In this implementation, the underlying mechanics of working with objects are actually very simple. Whenever changes are made to instance or class variables, those changes are simply reflected in the underlying dictionary object. For example, setting an instance variable on an instance updates its local instance dictionary. Likewise, when looking up the value of a instance variable of an object, one merely checks its instance dictionary for the existence of that variable. If the variable is not found there, things become a little more interesting. In that case, lookups are performed in the class dictionary and then in the class dictionaries of each of the base classes.

The process of looking up attributes in the class object and base classes is most commonly associated with locating methods. As previously mentioned, methods are stored in the dictionary of a class object which is shared by all instances of the same class. Thus, when a method is requested, you naturally won't find it in the instance dictionary of each individual object. Instead, you have to look it up in the class dictionary, and then ask each of the base classes in turn, stopping when a hit is found. Each of the base classes then implements the same algorithm recursively. This is commonly referred to as the depth-first, left-to-right rule, and has been the default method resolution order (MRO) used in most versions of Python. More modern releases have adapted a more sophisticated MRO, but that will be discussed in a later blog.

In implementing classes, one of my goals was to keep things simple. Thus, Python performs no advanced error checking or conformance checking when locating methods. For example, if a class overrides a method defined in a base class, no checks are performed to make sure that the redefined method has the same number of arguments or that it can be called in the same way as the original base-class method. The above method resolution algorithm merely returns the first method found and calls it with whatever arguments the user has supplied.

A number of other features also fall out of this design. For instance, even though the class dictionary was initially envisioned as a place to put methods, there was no inherent reason why other kinds of objects couldn't be placed there as well. Thus, if objects such as integers or strings are stored in the class dictionary, they become what are known as class variables---variables shared by all instances of a given class instead of being stored inside each instance.

Although the implementation of classes is simple, it also provides a surprisingly degree of flexibility. For instance, the implementation not only makes classes “first-class objects”, which are easily introspected at run time, it also makes it possible to modify a class dynamically. For example, methods can be added or modified by simply updating the class dictionary after a class object has already been created! (*) The dynamic nature of Python means that these changes have an immediate effect on all instances of that class or of any of its subclasses. Likewise, individual objects can be modified dynamically by adding, modifying, and deleting instance variables (a feature that I later learned made Python's implementation of objects more permissive than that found in Smalltalk which restricts the set of attributes to those specified at the time of object creation).

Development of the class Syntax

Having designed the run-time representations for user-defined classes and instances, my next task was to design the syntax for class definitions, and in particular for the method definitions contained therein. A major design constraint was that I didn’t want to add syntax for methods that differed from the syntax for functions. Refactoring the grammar and the byte code generator to handle such similar cases differently felt like a huge task. However, even if I was successful in keeping the grammar the same, I still had to figure out some way to deal with instance variables. Initially, I had hoped to emulate implicit instance variables as seen in C++. For example, in C++, classes are defined by code like this:

class A {
public:
   int x;
   void spam(int y) {
        printf("%d %d\n", x, y);
   }
};

In this class, an instance variable x has been declared. In methods, references to x implicitly refer to the corresponding instance variable. For example, in the method spam(), the variable x is not declared as either function parameter or as local variable However, since the class has declared an instance variable with that name, references to x simply refer to that variable. Although I had hoped to provide something similar in Python, it quickly became clear that such an approach would be impossible because there was no way to elegantly distinguish instance variables from local variables in a language without variable declarations.

In theory, getting the value of instance variables would be easy enough. Python already had a search order for unqualified variable names: locals, globals, and built-ins. Each of these were represented as a dictionary mapping variable names to values. Thus, for each variable reference, a series of dictionaries would be searched until a hit was found. For example, when executing a function with a local variable p, and a global variable q, a statement like “print p, q” would look up p in the first dictionary in the search order, the dictionary containing local variables, and find a match. Next it would look up q in the first dictionary, find no match, then look it up in the second dictionary, the global variables, and find a match.

It would have been easy to add the current object’s instance dictionary in front of this search list when executing a method. Then, in a method of an object with an instance variable x and local variable y, a statement like “print x, y” would find x in the instance dictionary (the first dictionary on the search path) and y in the local variable dictionary (the second dictionary).

The problem with this strategy is that it falls apart for setting instance variables. Python’s assignment doesn’t search for the variable name in the dictionaries, but simply adds or replaces the variable in the first dictionary in the search order, normally the local variables. This has the effect that variables are always created in the local scope by default (although it should be noted that there is a “global declaration” to override this on a per-function, per-variable basis.)

Without changing this simple-minded approach to assignment, making the instance dictionary the first item in the search order would make it impossible to assign to local variables in a method. For example, if you had a method like this

def spam(y):
    x = 1       
    y = 2

The assignments to x and y would overwrite the instance variable x and create a new instance variable y that shadowed the local variable y. Swapping instance variables and local variables in the search order would merely reverse the problem, making it impossible to assign to instance variables.

Changing the semantics of assignment to assign to an instance variable if one already existed and to a local otherwise wouldn’t work either, since this would create a bootstrap problem: how does an instance variable get created initially? A possible solution might have been to require explicit declaration of instance variables as was the case for global variables, but I really didn’t want to add a feature like that given that that I had gotten this far without any variable declarations at all. Plus, the extra specification required for indicating a global variable was more of a special case that was used sparingly in most code. Requiring a special specification for instance variables, on the other hand, would have to be used almost everywhere in a class. Another possible solution would have been to make instance variables lexically distinct. For example, having instance variables start with a special character such as @ (an approach taken by Ruby) or by having some kind of special naming convention involving prefixes or capitalization. Neither of these appealed to me (and they still don't).

Instead, I decided to give up on the idea of implicit references to instance variables. Languages like C++ let you write this->foo to explicitly reference the instance variable foo (in case there’s a separate local variable foo). Thus, I decided to make such explicit references the only way to reference instance variables. In addition, I decided that rather than making the current object ("this") a special keyword, I would simply make "this" (or its equivalent) the first named argument to a method. Instance variables would just always be referenced as attributes of that argument.

With explicit references, there is no need to have a special syntax for method definitions nor do you have to worry about complicated semantics concerning variable lookup. Instead, one simply defines a function whose first argument corresponds to the instance, which by convention is named "self." For example:

def spam(self,y):
    print self.x, y

This approach resembles something I had seen in Modula-3, which had already provided me with the syntax for import and exception handling. Modula-3 doesn’t have classes, but it lets you create record types containing fully typed function pointer members that are initialized by default to functions defined nearby, and adds syntactic sugar so that if x is such a record variable, and m is a function pointer member of that record, initialized to function f, then calling x.m(args) is equivalent to calling f(x, args). This matches the typical implementation of objects and methods, and makes it possible to equate instance variables with attributes of the first argument.

The remaining details of Python’s class syntax follow from this design or from the constraints imposed by the rest of the implementation. Keeping with my desire for simplicity, I envisioned a class statement as a series of method definitions, which are syntactically identical to function definitions even though by convention, they are required to have a first argument named "self". In addition, rather than devising a new syntax for special kinds of class methods (such as initializers and destructors), I decided that these features could be handled by simply requiring the user to implement methods with special names such as __init__, __del__, and so forth. This naming convention was taken from C where identifiers starting with underscores are reserved by the compiler and often have special meaning (e.g., macros such as __FILE__ in the C preprocessor).

Thus, I envisioned that a class would be defined by code that looked like this:

class A:
     def __init__(self,x):
         self.x = x
     def spam(self,y):
        print self.x, y

Again, I wanted to reuse as much of my earlier code as possible. Normally, a function definition is an executable statement that simply sets a variable in the current namespace referencing the function object (the variable name is the function name). Thus, rather than coming up with an entirely different approach for handling classes, it made sense to me to simply interpret the class body as a series of statements that were executed in a new namespace. The dictionary of this namespace would then be captured and used to initialize the class dictionary and create a class object. Underneath the covers, the specific approach taken is to turn the class body into an anonymous function that executes all of the statements in the class body and then returns the resulting dictionary of local variables. This dictionary is then passed to a helper function that creates a class object. Finally, this class object is then stored in a variable in the surrounding scope, whose name is the class name. Users are often surprised to learn that any sequence of valid Python statements can appear in a class body. This capability was really just a straightforward extension of my desire to keep the syntax simple as well as not artificially limiting what might possibly be useful.

A final detail is the class instantiation (instance creation) syntax. Many languages, like C++ and Java, use a special operator, “new”, to create new class instances. In C++ this may be defensible since class names have a rather special status in the parser, but in Python this is not the case. I quickly realized that, since Python’s parser doesn’t care what kind of object you call, making the class object itself callable was the right, “minimal” solution, requiring no new syntax. I may have been ahead of my time here---today, factory functions are often the preferred pattern for instance creation, and what I had done was simply to turn each class into its own factory.

Special Methods

As briefly mentioned in the last section, one of my main goals was to keep the implementation of classes simple. In most object oriented languages, there are a variety of special operators and methods that only apply to classes. For example, in C++, there is a special syntax for defining constructors and destructors that is different than the normal syntax used to define ordinary function and methods.

I really didn't want to introduce additional syntax to handle special operations for objects. So instead, I handled this by simply mapping special operators to a predefined set of "special method" names such as __init__ and __del__. By defining methods with these names, users could supply code related to the construction and destruction of objects.

I also used this technique to allow user classes to redefine the behavior of Python's operators. As previously noted, Python is implemented in C and uses tables of function pointers to implement various capabilities of built-in objects (e.g., “get attribute”, “add” and “call”). To allow these capabilities to be defined in user-defined classes, I mapped the various function pointers to special method names such as __getattr__, __add__, and __call__. There is a direct correspondence between these names and the tables of function pointers one has to define when implementing new Python objects in C.

__________

(*) Eventually, new-style classes made it necessary to control changes to the class __dict__; you can still dynamically modify a class, but you must use attribute assignment rather than using the class __dict__ directly.

Early Language Design and Development

From ABC to Python

Python’s first and foremost influence was ABC, a language designed in the early 1980s by Lambert Meertens, Leo Geurts and others at CWI. ABC was meant to be a teaching language, a replacement for BASIC, and a language and environment for personal computing. It was designed by first doing a task analysis of the programming task and then doing several iterations that included serious user testing. My own role in the ABC group was mainly that of implementing the language and its integrated editing environment.

Python’s use of indentation comes directly from ABC, but this idea didn’t originate with ABC--it had already been promoted by Donald Knuth and was a well-known concept of programming style. (The occam programming language also used it.) However, ABC’s authors did invent the use of the colon that separates the lead-in clause from the indented block. After early user testing without the colon, it was discovered that the meaning of the indentation was unclear to beginners being taught the first steps of programming. The addition of the colon clarified it significantly: the colon somehow draws attention to what follows and ties the phrases before and after it together in just the right way.

Python’s major data types also derive from ABC, though with some modifications. ABC’s lists were really bags or multisets, which were always kept sorted using a modified B-tree implementation. Its tables were associative arrays that were similarly kept sorted by key. I found that neither data type was suitable to represent, for example, the sequence of lines read from a file, which I anticipated would be a common use case. (In ABC you'd have to use a table with the line numbers as keys, but that complicates insertions and deletions.) So I changed the list type into a flexible array with insert and delete operations, giving users complete control over the ordering of items in a list. A sort method supported the occasional need for sorted results.

I also replaced the sorted tables with a hash table implementation. I chose a hash table because I believed this to be faster and easier to implement than ABC’s B-tree implementation. The latter was theoretically proven to be asymptotically optimal in space and time consumption for a wide variety of operations, but in practice it had turned out to be difficult to implement correctly due to the complexity of the B-tree algorithms. For the same reason, the performance was also sub-optimal for small table sizes.

I kept ABC’s immutable tuple data type--Python’s tuple packing and unpacking operations are taken directly from ABC. Since tuples are internally represented by arrays, I decided to add array-like indexing and slicing.

One consequence of adding an array-like interface to tuples is that I had to figure out some way to resolve the edge cases of tuples with length 0 or 1. One of the rules I took from ABC was that every data type, when printed or converted to a string, should be represented by an expression that was a valid input to the language’s parser. So, it followed that I needed to have notations for 0- and 1-length tuples. At the same time I didn’t want to lose the distinction between a one-tuple and a bare parenthesized expression, so I settled for an ugly but pragmatic approach where a trailing comma would turn an expression into a one-tuple and "()" would represent a zero-tuple. It's worth nothing that parentheses aren’t normally required by Python’s tuple syntax, except here--I felt representing the empty tuple by “nothing” could too easily mask genuine typos.

Python’s strings started with very similar (immutable) semantics as ABC’s strings, but with a different notation, and 0-based indexing. Since I now had three indexable types, list, tuples, and strings, I decided to generalize these to a common concept, the sequence. This generalization made it so certain core operations such as getting the length (len(s)), indexing (s[i]), slicing (s[i:j]), and iteration (for i in s) worked the same on any sequence type.

Numbers are one of the places where I strayed most from ABC. ABC had two types of numbers at run time; exact numbers which were represented as arbitrary precision rational numbers and approximate numbers which were represented as binary floating point with extended exponent range. The rational numbers didn’t pan out in my view. (Anecdote: I tried to compute my taxes once using ABC. The program, which seemed fairly straightforward, was taking way too long to compute a few simple numbers. Upon investigation it turned out that it was doing arithmetic on numers with thousands of digits of precision, which were to be rounded to guilders and cents for printing.) For Python I therefore chose a more traditional model with machine integers and machine binary floating point. In Python's implementation, these numbers are simply represented by the C datatypes of long and double respectively.

Feeling that there was still an important use case for unbounded exact numbers, I added a bignum data type, which I called long. I already had a bignum implementation that was the result of an unfinished attempt at improving ABC’s implementation a few years earlier (ABC’s original implementation, one of my first contributions to ABC, used a decimal representation internally). Thus, it made sense to me to use this code in Python.

Although I added bignums to Python, it's important to emphasize that I didn’t want to use bignums for all integer operations. From profiling Python programs written by myself and colleagues at CWI, I knew that integer operations represent a significant fraction of the total program running time of most programs. By far, the most common use of integers is to index sequences that fit in memory. Thus, I envisioned machine integers being used for all of the most-common cases and the extra range of bignums only coming into play when doing "serious math" or calculating the national debt of the US in pennies.

The Problem With Numbers

The implementation of numbers, especially integers, is one area where I made several serious design mistakes, but also learned important lessons concerning Python's design.

Since Python had two different integer types, I needed to figure out some way to distinguish between the two types in a program. My solution was to require users to explicitly state when they wanted to use longs by writing numbers with a trailing L (e.g., 1234L). This is one area where Python violated the ABC-inspired philosophy of not requiring users to care about a uninteresting implementation details.

Sadly, this was only a minor detail of much larger problem. A more egregious error was that my implementation of integers and longs had slightly different semantics in certain cases! Since the int type was represented as a machine integer, operations that overflowed would silently clip the result to 32 bits or whatever the precision of the C long type happened to be. In addition, the int type, while normally considered signed, was treated as an unsigned number by bitwise and shift operations and by conversions to/from octal and hexadecimal representations. Longs, on the other hand, were always considered signed. Therefore, some operations would produce a different result depending on whether an argument was represented as an int or a long. For example, given 32-bit arithmetic, 1<<31 (1 shifted left by 31 bits) would produce the largest negative 32-bit integer, and 1<<32 would produce zero, whereas 1L<<31 (1 represented as long shifted left by 31 bits) would produce a long integer equal to 2**31, and 1L<<32 would produce 2**32.

To resolve some of these issues I made a simple fix. Rather than having integer operations silently clip the result, I changed most arithmetic operations to raise an OverflowError exception when the result didn't fit. (The only exception to this checking were the "bit-wise" operations mentioned above, where I assumed that users would expect the behavior of these operations in C.) Had I not added this check, Python users would have undoubtedly started writing code that relied on the semantics of signed binary arithmetic modulo 2**32 (like C users do), and fixing the mistake would have been a much more painful transition to the community.

Although the inclusion of overflow checking might seem like a minor implementation detail, a painful debugging experience made me realize that this was a useful feature. As one of my early programming experiments in Python, I tried to implement a simple mathematical algorithm, the computation of “Meertens numbers”, a bit of recreational mathematics invented by Richard Bird for the occasion of ABC’s primary author’s 25ths anniversary at CWI. The first few Meertens numbers are small, but when translating the algorithm into code I hadn’t realized that the intermediate results of the computation are much larger than 32 bits. It took a long and painful debugging session before I discovered this, and I decided there and then to address the issue by checking all integer operations for overflow, and raising an exception whenever the result could not be represented as a C long. The extra cost of the overflow check would be dwarfed by the overhead I was already incurring due to the implementation choice of allocating a new object for the result.

Sadly, I'm sorry to say that raising an overflow exception was not really the right solution either! At the time, I was stuck on C’s rule “operations on numeric type T return a result of type T”. This rule was also the reason for my other big mistake in integer semantics: truncating the result of integer division, which I will discuss in a later blog post. In hindsight, I should have made integer operations that overflow promote their result to longs. This is the way that Python works today, but it took a long time to make this transition.

Despite the problem with numbers, one very positive thing came out of this experience. I decided that there should be no undefined result values in Python – instead, exceptions should always be raised when no correct return value can be computed. Thus, Python programs would never fail due to undefined values being silently passed around behind the scenes. This is still an important principle of the language, both in the language proper and in the standard library.

Saturday, December 26, 2009

How Everything Became an Executable Statement

New users to Python are sometimes surprised to find out that every part of the language is an executable statement, including function and class definitions. That means that any statement can appear anywhere in a program. For instance, a function definition could appear inside an "if" statement if you wanted.

In a very early version of Python’s grammar, this was not the case: grammar elements that had a “declarative flavor”, like import statements and function definitions, were allowed only at the top level in a module or script (where they were being executed in order to become effective). However, at the time I was adding support for classes, I decided that this was too restrictive.

My reasoning went roughly as follows. Rather than defining the class body as a series of function declarations only, it seemed to make sense to also allow regular variable assignments there. However, if I was going to allow that, why not go one step further and allow arbitrary executable code? Or, taking this even further, why not allow function declarations inside an “if” statement, for example? It quickly became clear that this enabled a simplification of the grammar, since now all uses of statements (whether indented or not) could share the same grammar rule, and hence the compiler could use the same byte code generation function for all of them.

Although this reasoning allowed me to simplify the grammar and allowed users to place Python statements anywhere, this feature did not necessarily enable certain styles of programming. For example, the Python grammar technically allowed users to write things such as nested functions even though the underlying semantics of Python didn't support nested scopes. Therefore, code such as that would often operate in ways that were unexpected or "broken" compared to languages that were actually designed with such features in mind. Over time, many of these "broken" features have been fixed. For example, nested function definitions only began to work more sanely in Python 2.1.

First-class Everything

[Folks, please don't use the comments section of this blog to ask questions. If you want to suggest a topic for a future blog entry, send me email. (Use Google to find my home page, which has my email address.) If you want to propose a change or discuss the merits of alternative designs, use the python-ideas mailing list at python.org.]

One of my goals for Python was to make it so that all objects were "first class." By this, I meant that I wanted all objects that could be named in the language (e.g., integers, strings, functions, classes, modules, methods, etc.) to have equal status. That is, they can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth.

The internal implementation of Python made this simple to do. All of Python's objects were based on a common C data structure that was used everywhere in the interpreter. Variables, lists, functions, and everything else just used variations of this one data structure---it just didn't matter if the structure happened to represent a simple object such as an integer or something more complicated such as a class.

Although the idea of having "first-class everything" is conceptually simple, there was still one subtle aspect of classes that I still needed to address---namely, the problem of making methods first class objects.

Consider this simple Python class (copied from last week's blog post):

class A:
     def __init__(self,x):
         self.x = x
     def spam(self,y):
        print self.x, y

If methods are going to be first-class objects, then they can be assigned to other variables and used just like other objects in Python. For example, someone could write a Python statement such as "s = A.spam". In this case, the variable "s" refers to a method of a class, which is really just a function. However, a method is not quite the same as ordinary function. Specifically, the first argument of a method is supposed to be an instance of the class in which a method was defined.

To deal with this, I created a type of callable object known as an "unbound method." An unbound method was really just a thin wrapper around the function object that implemented a method, but it enforced a restriction that the first argument had to be an instance of the class in which the method was defined. Thus, if someone wanted to call an unbound method "s" as a function, they would have to pass an instance of class "A" as the first argument. For example, "a = A(); s(a)". (*)

A related problem occurs if someone writes a Python statement that refers to a method on a specific instance of an object. For example, someone might create an instance using "a = A()" and then later write a statement such as "s = a.spam". Here, the variable "s" again refers to a method of a class, but the reference to that method was obtained through an instance "a" . To handle this situation, a different callable object known as a "bound method." is used. This object is also a thin wrapper around the function object for the method. However, this wrapper implicitly stores the original instance that was used to obtain the method. Thus, a later statement such as "s()" will call the method with the instance "a" implicitly set as the first argument.

In reality, the same internal object type is used to represent bound and unbound methods. One of the attributes of this object contains a reference to an instance. If set to None, the method is unbound. Otherwise, the method is bound.

Although bound and unbound methods might seem like an unimportant detail, they a critical part of how classes work underneath the covers. Whenever a statement such as "a.spam()" appears in a program, the execution of that statement actually occurs in two steps. First, a lookup of "a.spam" occurs. This returns a bound method--a callable object. Next, a function call operation "()" is applied to that object to invoke the method with user supplied arguments.

__________

(*) In Python 3000, the concept of unbound methods has been removed, and the expression "A.spam" returns a plain function object. It turned out that the restriction that the first argument had to be an instance of A was rarely helpful in diagnosing problems, and frequently an obstacle to advanced usages --- some have called it "duck typing self" which seems an appropriate name.

Friday, December 25, 2009

The Problem with Integer Division

Python's handling of integer division is an example of early mistake with huge consequences. As mentioned earlier, when Python was created, I abandoned the approach to numbers that had been used in ABC. For example, in ABC, when you divided two integers, the result was an exact rational number representing the result. In Python however, integer division truncated the result to an integer.

In my experience, rational numbers didn't pan out as ABC's designers had hoped. A typical experience would be to write a simple program for some business application (say, doing one’s taxes), and find that it was running much slower than expected. After some debugging, the cause would be that internally the program was using rational numbers with thousands of digits of precision to represent values that would be truncated to two or three digits of precision upon printing. This could be easily fixed by starting an addition with an inexact zero, but this was often non-intuitive and hard to debug for beginners.

So with Python, I relied on the other numerical model with which I was familiar, C. C has integers and floating point numbers of various sizes. So, I chose to represent Python integers by a C long (guaranteeing at least 32 bits of precision) and floating point numbers by a C double. I then added an arbitrary precision integer type which I called "long."

The major mistake was that I also borrowed a rule that makes sense in C but not so much in a very-high-level language. For the standard arithmetic operations, including division, the result would always be the same type as the operands. To make matters worse, I initially used another misguided rule that forbade mixed-mode arithmetic, with the aim of making the type implementations independent from each other. So, originally you couldn’t add an int to a float, or even an int to a long. After Python was released publicly, Tim Peters quickly convinced me that this was a really bad idea, and I introduced mixed-mode arithmetic with the usual coercion rules. For example, mixing an int and a long operand would convert the argument of type int to long and return a long result and mixing either with float would convert the int or long argument to float and return a float result..

Unfortunately, the damage was done--integer division returned an integer result. You might be wondering "Why was this so bad?" Was this really just an ado about nothing? Historically, the proposal to change this has had some tough opposition from folks who believed that learning about integer division was one of the more useful "rites of passage" for all programmers. So let me explain the reasons for considering this a design bug.

When you write a function implementing a numeric algorithm (for example, calculating the phase of the moon) you typically expect the arguments to be specified as floating point numbers. However, since Python doesn’t have type declarations, nothing is there to stop a caller from providing you with integer arguments. In a statically typed language, like C, the compiler will coerce the arguments to floats, but Python does no such thing – the algorithm is run with integer values until the wonders of mixed-mode arithmetic produce intermediate results that are floats.

For everything except division, integers behave the same as the corresponding floating point numbers. For example, 1+1 equals 2 just as 1.0+1.0 equals 2.0, and so on. Therefore one can easily be misled to expect that numeric algorithms will behave regardless of whether they execute with integer or floating point arguments. However, when division is involved, and the possibility exists that both operands are integers, the numeric result is silently truncated, essentially inserting a potentially large error into the computation. Although one can write defensive code that coerces all arguments to floats upon entry, this is tedious, and it doesn’t enhance the readability or maintainability of the code. Plus, it prevents the same algorithm from being used with complex arguments (although that may be highly special cases).

Again, all of this is an issue because Python doesn’t coerce arguments automatically to some declared type. Passing an invalid argument, for example a string, is generally caught quickly because few operations accept mixed string/numeric operands (the exception being multiplication). However, passing an integer can cause an answer that is close to the correct answer but has a much larger error – which is difficult to debug or even notice. (This recently happened to me in a program that draws an analog clock – the positions of the hands were calculated incorrectly due to truncation, but the error was barely detectable except at certain times of day.)

Fixing integer division was no easy task due to programs that rely on the behavior of integer truncation. A truncating division operator (//) was added to the language to provide the same functionality. In addition, a mechanism ("from __future__ import division") was introduced by which the new integer division semantics could be enabled.Finally , a command line flag (-Qxxx) was added both to change the behavior and to aid in program conversion. Fortunately, the correct behavior has become the default behavior in Python 3000.

History Articles | History Article

Thursday, December 31, 2009

This is Time Travel

Google Maps Follow Protests in Iran

Google and Texting and other uses for cell phones in class

Wednesday, December 30, 2009

Review By Google Wave

Google Docs

Kids Making Their Own Cartoons

Tuesday, December 29, 2009

A Brief Timeline of Python

Python's Design Philosophy

Introduction and Overview

Monday, December 28, 2009

Personal History - part 1, CWI

Personal History - part 2, CNRI and beyond

Microsoft Ships Python Code... in 1996

Sunday, December 27, 2009

Python's Use of Dynamic Typing

Adding Support for User-defined Classes

Early Language Design and Development

Saturday, December 26, 2009

How Everything Became an Executable Statement

First-class Everything

Friday, December 25, 2009

The Problem with Integer Division

Categories

Latest News