Dabeaz

Dave Beazley's mondo computer blog. [ homepage | archive ]

Tuesday, January 03, 2012

 

The Compiler Experiment Begins

January 13, 2012 Update: There are still a few seats left in the compilers class for January 17-20, 2012. More details here.

In the spring of 1995, I took a course on compiler design. At the time, I was just a first year Ph.D. Computer Science student making my way through various course requirements. Before that, I was a mathematician working on computational physics software--writing a lot of finely tuned C code for solving differential equations on big supercomputers. Although I already considered myself to be a pretty knowledgable programmer, I think compilers was probably the one course that had the most profound impact on my later work. In fact, this is the course that inspired me to look at the use of scripting languages for controlling scientific software. It also directly led to the Swig project, first implemented in the summer of 1995. Last, but not least, this is how I ultimately ended up in the world of Python.

I think the great thing about compilers was how it simply tied so many topics together all in one place. Everything from mathematical theory, clever algorithms, programming language semantics, computer architecture, software design, clever hacking, and even the nature of computation itself. As part of that course, we had to write our own compiler--a tangled mess of C code that turned a subset of Pascal into executable code that would actually run on a Sun Sparcstation. To be sure, the code was a horrible disaster. However, simply having written a working compiler was definitely one of the most memorable parts of graduate school.

In 2001, I had an opportunity to revisit the topic of compilers. At the time, I was an assistant professor at the University of Chicago and an opportunity to teach compilers came up. I jumped at it. I also used the opportunity to try an experiment of what it might be like to write a compiler in Python instead of C. As a bit of context, a lot of people had been asking me about the idea of rewriting Swig in Python (instead of C++). I wasn't so sure. In fact, I really didn't even know how to do it given doubts about Python's performance as well as a general lack of sufficiently powerful parsing tools at the time. Long story short--this is how the PLY project came into existence. I used it in the class and had about 25 students write a compiler for an even more powerful subset of Pascal, creating runnable code for the Sparc.

Fast forward 11 years. I've long since left the University, but I still continue to teach quite a few classes--especially various sorts of Python classes. Over the past year or so, students and I have often discussed the idea of having some kind of advanced project course. Something that would be quite a bit harder and involve much more coding. I think you might see where this is going.

Write a Compiler (in Python)

So, today is the first day of another compiler experiment. Over the course of 4 days, I'm going to attempt to take six students through a compiler writing project similar to the one at the University. It's basically a nine-stage project:

  1. Lexing and tokenizing.
  2. Parsing and parse trees.
  3. Type checking.
  4. Intermediate code generation.
  5. Simple optimization (constant folding, etc.).
  6. Relations
  7. Control flow
  8. Functions
  9. Output code in RPython (from the PyPy project)

One interesting thing about using Python for such a project is that you can use the internals of Python itself to explore important concepts. For example, if you want to see what happens when you compile a regular expression, you can just try it:

>>> import sre_parse
>>> sre_parse.parse(r"[a-zA-Z_][a-zA-Z0-9_]*")
[('in', [('range', (97, 122)), ('range', (65, 90)), 
 ('literal', 95)]), ('max_repeat', (0, 65535, [('in', 
 [('range', (97, 122)), ('range', (65, 90)), 
('range', (48, 57)), ('literal', 95)])]))]
>>> 

Or, if you want to look at how Python makes an AST:

>>> import ast 
>>> node = ast.parse("a = x + 2*y")
>>> ast.dump(node)
"Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=BinOp(left=Name(id='x', ctx=Load()), op=Add(), right=BinOp(left=Num(n=2), op=Mult(), right=Name(id='y', ctx=Load()))))])"
>>> 

Or, if you want to see what kind of code Python generates:

>>> def fact(n):
...     if n <= 1:
...             return 1
...     else:
...             return n*fact(n-1)
... 
>>> import dis
>>> dis.dis(fact)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               1 (<=)
              9 POP_JUMP_IF_FALSE       16

  3          12 LOAD_CONST               1 (1)
             15 RETURN_VALUE        

  5     >>   16 LOAD_FAST                0 (n)
             19 LOAD_GLOBAL              0 (fact)
             22 LOAD_FAST                0 (n)
             25 LOAD_CONST               1 (1)
             28 BINARY_SUBTRACT     
             29 CALL_FUNCTION            1
             32 BINARY_MULTIPLY     
             33 RETURN_VALUE        
             34 LOAD_CONST               0 (None)
             37 RETURN_VALUE        
>>> 

By looking at what Python does itself, I think it can be related back the work the students will be doing on their own project and might be an interesting way to explore important concepts without getting completely bogged down in a theory-heavy exposition (as one might find in a compilers textbook). I don't have any grand illusions about the students running off afterwards to do research in compilers. However, I think it will be an interesting experiment where everyone still learns a lot.

Follow the Project

Due to time constraints of the project, I won't be blogging during the week. However, you can follow me on Twitter for updates to see how it's going. I will be posting a more detailed followup describing the project and how it worked out after it's over.

If you would like to write a compiler yourself, there are still some seats left in a second running of the project, January 17-20. Click here for more information.


Comments:
Dave, what IR are you going to use?
 
For IR, I'm going to have everyone create simple three-address code sequences. Easy to represent using tuples. Later stages of the project build control-flow graphs of 3AC basic blocks.
 
First day progress report: We completed the lexing and parsing stages of the project. At this point, the language consists of simple constant and variable declarations, mathematical expressions, and printing (no control flow or functions yet). The overall syntax is modeled after Go, which turns out to be fairly easy to parse. Everyone's compiler is now building an abstract syntax tree modeled loosely after the approach found in Python's own ast module.

As to be expected, minds exploded once we hit the use of LALR(1) parser generators (PLY/yacc). I think that's usually the normal response. Besides, you can never have too many tricky shift/reduce conflicts.

Tomorrow, it's on to type checking and simple code generation. From a design point of view, we're building the compiler as independent stages as opposed to trying to do everything in one huge pass. So, most of tomorrow is going to involve various forms of AST traversal.
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

Prior Posts by Topic

08/01/2009 - 09/01/2009   09/01/2009 - 10/01/2009   10/01/2009 - 11/01/2009   11/01/2009 - 12/01/2009   12/01/2009 - 01/01/2010   01/01/2010 - 02/01/2010   02/01/2010 - 03/01/2010   04/01/2010 - 05/01/2010   05/01/2010 - 06/01/2010   07/01/2010 - 08/01/2010   08/01/2010 - 09/01/2010   09/01/2010 - 10/01/2010   12/01/2010 - 01/01/2011   01/01/2011 - 02/01/2011   02/01/2011 - 03/01/2011   03/01/2011 - 04/01/2011   04/01/2011 - 05/01/2011   05/01/2011 - 06/01/2011   08/01/2011 - 09/01/2011   09/01/2011 - 10/01/2011   12/01/2011 - 01/01/2012   01/01/2012 - 02/01/2012   02/01/2012 - 03/01/2012   03/01/2012 - 04/01/2012   07/01/2012 - 08/01/2012   01/01/2013 - 02/01/2013   03/01/2013 - 04/01/2013   06/01/2014 - 07/01/2014   09/01/2014 - 10/01/2014  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]