Dabeaz

Dave Beazley's mondo computer blog. [ homepage | archive ]

Tuesday, January 31, 2012

 

Drunk Tweeting in Chicago

Lately, I've been messing around with the requests and regex libraries for Python. They are awesome. So, without any further explanation, I present this short script that uses both in an attempt to identify people drunk-tweeting in Chicago. Enjoy.

# drunktweet.py
'''
Print out possible drunk tweets from the city of Chicago.
'''

import regex
import requests
import json

# Terms for being "wasted"
terms = { 'drunk','wasted','buzzed','hammered','plastered' }

# A fuzzy regex for people who can't type
pat = regex.compile(r"(?:\L<terms>){i,d,s,e<=2}$", regex.I, terms=terms)

# Connect to the Twitter streaming API
url   = "https://stream.twitter.com/1/statuses/filter.json"
parms = {
    'locations' : "-87.94,41.644,-87.523,42.023"    # Chicago
    }
auth  = ('username','password')
r = requests.post(url, data=parms, auth=auth)

# Print possible candidates
for line in r.iter_lines():
    if line:
        tweet = json.loads(line)
        status = tweet.get('text',u'')
        words = status.split()
        if any(pat.match(word) for word in words):
           print(tweet['user']['screen_name'], status)

It's left as an exercise to reader to filter out false-positives and have the script call a cab. By the way, you should check out some of my Python Classes in Chicago.


Tuesday, January 03, 2012

 

The Compiler Experiment Begins

January 13, 2012 Update: There are still a few seats left in the compilers class for January 17-20, 2012. More details here.

In the spring of 1995, I took a course on compiler design. At the time, I was just a first year Ph.D. Computer Science student making my way through various course requirements. Before that, I was a mathematician working on computational physics software--writing a lot of finely tuned C code for solving differential equations on big supercomputers. Although I already considered myself to be a pretty knowledgable programmer, I think compilers was probably the one course that had the most profound impact on my later work. In fact, this is the course that inspired me to look at the use of scripting languages for controlling scientific software. It also directly led to the Swig project, first implemented in the summer of 1995. Last, but not least, this is how I ultimately ended up in the world of Python.

I think the great thing about compilers was how it simply tied so many topics together all in one place. Everything from mathematical theory, clever algorithms, programming language semantics, computer architecture, software design, clever hacking, and even the nature of computation itself. As part of that course, we had to write our own compiler--a tangled mess of C code that turned a subset of Pascal into executable code that would actually run on a Sun Sparcstation. To be sure, the code was a horrible disaster. However, simply having written a working compiler was definitely one of the most memorable parts of graduate school.

In 2001, I had an opportunity to revisit the topic of compilers. At the time, I was an assistant professor at the University of Chicago and an opportunity to teach compilers came up. I jumped at it. I also used the opportunity to try an experiment of what it might be like to write a compiler in Python instead of C. As a bit of context, a lot of people had been asking me about the idea of rewriting Swig in Python (instead of C++). I wasn't so sure. In fact, I really didn't even know how to do it given doubts about Python's performance as well as a general lack of sufficiently powerful parsing tools at the time. Long story short--this is how the PLY project came into existence. I used it in the class and had about 25 students write a compiler for an even more powerful subset of Pascal, creating runnable code for the Sparc.

Fast forward 11 years. I've long since left the University, but I still continue to teach quite a few classes--especially various sorts of Python classes. Over the past year or so, students and I have often discussed the idea of having some kind of advanced project course. Something that would be quite a bit harder and involve much more coding. I think you might see where this is going.

Write a Compiler (in Python)

So, today is the first day of another compiler experiment. Over the course of 4 days, I'm going to attempt to take six students through a compiler writing project similar to the one at the University. It's basically a nine-stage project:

  1. Lexing and tokenizing.
  2. Parsing and parse trees.
  3. Type checking.
  4. Intermediate code generation.
  5. Simple optimization (constant folding, etc.).
  6. Relations
  7. Control flow
  8. Functions
  9. Output code in RPython (from the PyPy project)

One interesting thing about using Python for such a project is that you can use the internals of Python itself to explore important concepts. For example, if you want to see what happens when you compile a regular expression, you can just try it:

>>> import sre_parse
>>> sre_parse.parse(r"[a-zA-Z_][a-zA-Z0-9_]*")
[('in', [('range', (97, 122)), ('range', (65, 90)), 
 ('literal', 95)]), ('max_repeat', (0, 65535, [('in', 
 [('range', (97, 122)), ('range', (65, 90)), 
('range', (48, 57)), ('literal', 95)])]))]
>>> 

Or, if you want to look at how Python makes an AST:

>>> import ast 
>>> node = ast.parse("a = x + 2*y")
>>> ast.dump(node)
"Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=BinOp(left=Name(id='x', ctx=Load()), op=Add(), right=BinOp(left=Num(n=2), op=Mult(), right=Name(id='y', ctx=Load()))))])"
>>> 

Or, if you want to see what kind of code Python generates:

>>> def fact(n):
...     if n <= 1:
...             return 1
...     else:
...             return n*fact(n-1)
... 
>>> import dis
>>> dis.dis(fact)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               1 (<=)
              9 POP_JUMP_IF_FALSE       16

  3          12 LOAD_CONST               1 (1)
             15 RETURN_VALUE        

  5     >>   16 LOAD_FAST                0 (n)
             19 LOAD_GLOBAL              0 (fact)
             22 LOAD_FAST                0 (n)
             25 LOAD_CONST               1 (1)
             28 BINARY_SUBTRACT     
             29 CALL_FUNCTION            1
             32 BINARY_MULTIPLY     
             33 RETURN_VALUE        
             34 LOAD_CONST               0 (None)
             37 RETURN_VALUE        
>>> 

By looking at what Python does itself, I think it can be related back the work the students will be doing on their own project and might be an interesting way to explore important concepts without getting completely bogged down in a theory-heavy exposition (as one might find in a compilers textbook). I don't have any grand illusions about the students running off afterwards to do research in compilers. However, I think it will be an interesting experiment where everyone still learns a lot.

Follow the Project

Due to time constraints of the project, I won't be blogging during the week. However, you can follow me on Twitter for updates to see how it's going. I will be posting a more detailed followup describing the project and how it worked out after it's over.

If you would like to write a compiler yourself, there are still some seats left in a second running of the project, January 17-20. Click here for more information.


Archives

Prior Posts by Topic

08/01/2009 - 09/01/2009   09/01/2009 - 10/01/2009   10/01/2009 - 11/01/2009   11/01/2009 - 12/01/2009   12/01/2009 - 01/01/2010   01/01/2010 - 02/01/2010   02/01/2010 - 03/01/2010   04/01/2010 - 05/01/2010   05/01/2010 - 06/01/2010   07/01/2010 - 08/01/2010   08/01/2010 - 09/01/2010   09/01/2010 - 10/01/2010   12/01/2010 - 01/01/2011   01/01/2011 - 02/01/2011   02/01/2011 - 03/01/2011   03/01/2011 - 04/01/2011   04/01/2011 - 05/01/2011   05/01/2011 - 06/01/2011   08/01/2011 - 09/01/2011   09/01/2011 - 10/01/2011   12/01/2011 - 01/01/2012   01/01/2012 - 02/01/2012   02/01/2012 - 03/01/2012   03/01/2012 - 04/01/2012   07/01/2012 - 08/01/2012   01/01/2013 - 02/01/2013   03/01/2013 - 04/01/2013   06/01/2014 - 07/01/2014   09/01/2014 - 10/01/2014  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]