Dabeaz

Dave Beazley's mondo computer blog. [ homepage | archive ]

Tuesday, March 13, 2012

 

PyCon 2012 Followup

Well, PyCon 2012 has come and gone. It was fantastic to see everyone and to learn new things. For me, I think the big takeaway from the conference was all of the activity surrounding scientific computing and data analysis. The ipython notebook project is really quite amazing--I will need to spend more time checking that out. As a former stats geek, I'm also quite interested in checking out pandas. Last, but not least, I really enjoyed chairing the session on GIS tools. I have taken a recent interest in GIS and mapping, but quickly realized that I was way out of my element with it. The talks were a nice way to connect with some of the tools and techniques involved.

The 5.72 missing slides from my keynote

In my keynote talk on PyPy, I somehow managed to cover 123 presentation slides with 2 minutes to spare. Vern Ceder then noted that I now owe everyone an extra 5.72 slides. So, without further delay, here are the 5.72 extra things I wish I could have included in my PyPy Keynote:

  1. Even the Mandelbrot set code that I presented doesn't do PyPy's performance justice. I have been experimenting with some other libraries including a graphics library and a program for molecular dynamics. On these programs, I've seen speedups of almost 50x going from CPython to PyPy. It's pretty amazing.
  2. Don't panic about the amount of source code! Even though there are about 1.25 million lines of Python code in the full PyPy distribution, a huge chunk of that is the Python standard library (maybe 500-600K lines). There are also a lot of unit tests, supporting tools, and other things. Thus, in terms of actual code, the size of PyPy itself is not that big. The main point is that if you just dive into PyPy without knowing much about it though, you will see an awful lot of Python code sitting around.
  3. Don't let the documentation and papers intimidate you. I have found that over time, with repeated readings and messing around, the material all starts to come together in a more coherent way. It's just that when you first start, it can all be a bit overwhelming.
  4. RPython gives you much more than just translation to C. You also get important things such as garbage collection and useful containers. Actually, one of the main benefits of RPython is that you don't have to worry about such low-level details as memory allocation, pointers, and other matters commonly associated with C programming. These extra features are why even translating a simple "Hello World" program includes quite a bit of supporting C code.
  5. It's also important to strongly emphasize that RPython is simply the implementation language for PyPy. That is, PyPy is a Python interpreter that just so happens to be written in RPython, much like CPython is a Python interpreter written in C. It's important to note that RPython can be used all on its own to implement new programs and interpreters for other unrelated languages.
  6. (.72) I have been working on a simple GIL removal patch, but...

If you've been using my past GIL work to troll, please stop it already

In Guido's keynote talk, he talked about feeding trolls and at one point mentioned the GIL. Having done a lot of past work trying to understand the GIL, I'd like to say a few words about that. First of all, if you think that my primary interest in the GIL is to show how Python threads (or Python itself) suck, then you would be wrong. You will not find either claim in any of that work nor will you find me online engaging in discussion about such things.

The truth of the matter is that threads are my preferred means of concurrent programming in Python. Not only that, they work extremely well for all sorts of problems involving I/O. However, if you're going to program with threads, it's important to be aware of situations where they do not work as well. In the case of Python, this mostly pertains to programs that try to subdivide computation across threads. There are also potential issues with code that overlaps heavy computation with background I/O handling. Thus, my main interest in the GIL has been to explore some of these corner cases in some detail with an eye towards improving the GIL implementation--something that I consider to be a worthy goal. More generally, I think the GIL is interesting to study as a systems programming puzzle on its own.

In any case, the performance of threads is highly specific to the application at hand. You can't just take some benchmark from one of my GIL talks and extrapolate that out to a general statement about all Python thread programming. Personally, I find that Python threads have worked pretty well for most of the problems where I've used them. Of course your mileage might vary.

Angry Birds

Last, but not least, I never should have gone to Jason Huggins's talk about the robot that plays Angry Birds. I have now become that robot when I should be working on my concurrency workshop. Argh!


Archives

Prior Posts by Topic

08/01/2009 - 09/01/2009   09/01/2009 - 10/01/2009   10/01/2009 - 11/01/2009   11/01/2009 - 12/01/2009   12/01/2009 - 01/01/2010   01/01/2010 - 02/01/2010   02/01/2010 - 03/01/2010   04/01/2010 - 05/01/2010   05/01/2010 - 06/01/2010   07/01/2010 - 08/01/2010   08/01/2010 - 09/01/2010   09/01/2010 - 10/01/2010   12/01/2010 - 01/01/2011   01/01/2011 - 02/01/2011   02/01/2011 - 03/01/2011   03/01/2011 - 04/01/2011   04/01/2011 - 05/01/2011   05/01/2011 - 06/01/2011   08/01/2011 - 09/01/2011   09/01/2011 - 10/01/2011   12/01/2011 - 01/01/2012   01/01/2012 - 02/01/2012   02/01/2012 - 03/01/2012   03/01/2012 - 04/01/2012   07/01/2012 - 08/01/2012   01/01/2013 - 02/01/2013   03/01/2013 - 04/01/2013  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]