Dabeaz

Dave Beazley's mondo computer blog. [ homepage | archive ]

Tuesday, March 13, 2012

 

PyCon 2012 Followup

Well, PyCon 2012 has come and gone. It was fantastic to see everyone and to learn new things. For me, I think the big takeaway from the conference was all of the activity surrounding scientific computing and data analysis. The ipython notebook project is really quite amazing--I will need to spend more time checking that out. As a former stats geek, I'm also quite interested in checking out pandas. Last, but not least, I really enjoyed chairing the session on GIS tools. I have taken a recent interest in GIS and mapping, but quickly realized that I was way out of my element with it. The talks were a nice way to connect with some of the tools and techniques involved.

The 5.72 missing slides from my keynote

In my keynote talk on PyPy, I somehow managed to cover 123 presentation slides with 2 minutes to spare. Vern Ceder then noted that I now owe everyone an extra 5.72 slides. So, without further delay, here are the 5.72 extra things I wish I could have included in my PyPy Keynote:

  1. Even the Mandelbrot set code that I presented doesn't do PyPy's performance justice. I have been experimenting with some other libraries including a graphics library and a program for molecular dynamics. On these programs, I've seen speedups of almost 50x going from CPython to PyPy. It's pretty amazing.
  2. Don't panic about the amount of source code! Even though there are about 1.25 million lines of Python code in the full PyPy distribution, a huge chunk of that is the Python standard library (maybe 500-600K lines). There are also a lot of unit tests, supporting tools, and other things. Thus, in terms of actual code, the size of PyPy itself is not that big. The main point is that if you just dive into PyPy without knowing much about it though, you will see an awful lot of Python code sitting around.
  3. Don't let the documentation and papers intimidate you. I have found that over time, with repeated readings and messing around, the material all starts to come together in a more coherent way. It's just that when you first start, it can all be a bit overwhelming.
  4. RPython gives you much more than just translation to C. You also get important things such as garbage collection and useful containers. Actually, one of the main benefits of RPython is that you don't have to worry about such low-level details as memory allocation, pointers, and other matters commonly associated with C programming. These extra features are why even translating a simple "Hello World" program includes quite a bit of supporting C code.
  5. It's also important to strongly emphasize that RPython is simply the implementation language for PyPy. That is, PyPy is a Python interpreter that just so happens to be written in RPython, much like CPython is a Python interpreter written in C. It's important to note that RPython can be used all on its own to implement new programs and interpreters for other unrelated languages.
  6. (.72) I have been working on a simple GIL removal patch, but...

If you've been using my past GIL work to troll, please stop it already

In Guido's keynote talk, he talked about feeding trolls and at one point mentioned the GIL. Having done a lot of past work trying to understand the GIL, I'd like to say a few words about that. First of all, if you think that my primary interest in the GIL is to show how Python threads (or Python itself) suck, then you would be wrong. You will not find either claim in any of that work nor will you find me online engaging in discussion about such things.

The truth of the matter is that threads are my preferred means of concurrent programming in Python. Not only that, they work extremely well for all sorts of problems involving I/O. However, if you're going to program with threads, it's important to be aware of situations where they do not work as well. In the case of Python, this mostly pertains to programs that try to subdivide computation across threads. There are also potential issues with code that overlaps heavy computation with background I/O handling. Thus, my main interest in the GIL has been to explore some of these corner cases in some detail with an eye towards improving the GIL implementation--something that I consider to be a worthy goal. More generally, I think the GIL is interesting to study as a systems programming puzzle on its own.

In any case, the performance of threads is highly specific to the application at hand. You can't just take some benchmark from one of my GIL talks and extrapolate that out to a general statement about all Python thread programming. Personally, I find that Python threads have worked pretty well for most of the problems where I've used them. Of course your mileage might vary.

Angry Birds

Last, but not least, I never should have gone to Jason Huggins's talk about the robot that plays Angry Birds. I have now become that robot when I should be working on my concurrency workshop. Argh!


Comments:
Dear Dabeaz:

Thank you for your keynote. It was both critical and constructive. Being able to tinker with one's tools is valuable, both for the tinkerer and for everyone. It's too bad PyPy is so hard to tinker with.

As I was cogitating on this, I realized you had made a mistake. You mentioned something to the effect that the listener was familiar with C, Make, autoconf, and so on. Not true!

I'm familiar with those, but a large and growing number of the people you were speaking to are not. They would be even more confused and intimidated by encountering C code while investigating CPython or CRuby than they would by encountering RPython while investigating PyPy.

Of course, there are other reasons why it is harder to get one's head around PyPy than CPython, even if one understands RPython more easily than C.

In short, you were right that PyPy is intimidating and difficult to plumb, but for some of the people you spoke to, you named the wrong reason. Hopefully someday PyPy's use of RPython instead of C will metamorphose into a help rather than a hindrance to the learner.

Regards,

Zooko
 
Zooko,

That's an interesting comment on knowing about C, make and other tools. Although it's true that many people might not know those, they are still very widely used throughout open source development for all sorts of things. Thus, the number of programmers familiar with them would be far far higher than something like RPython.

That said, I have sometimes wondered whether my knowledge of C/make was actually a hinderance to understanding PyPy though. For instance, maybe they gave me a preconceived notion about what an interpreter is and how it is traditionally implemented.

I do think that tinkering with something as advanced as PyPy probably requires a wide range of skills ranging from Python to assembly language though. Obviously, the more you know, the better.
 
David, thanks for the kind comments you made about IPython. We've put a ton of work into it, and will continue to do so over the next few months, so by all means let us know of anything that you find as a limitation/problem.

We want this to work well as a document for interactive exploration, collaborative research and presentation/education. Feedback welcome!
 
Great talk about PyPy. Have you thought about bringing the problems you perceive to the pypy-dev mailing list?

Because I had the same problems (and still have) regarding documentation and pypy organization. One thing I have talked about is renaming pypy the python interpreter and pypy the translation toolchain. Although there are some overlaps between them now with the python 3 port underway it might be easier to try and separate them to make easier for people to understand pypy.

Another point: the pypy gil IIRC was copied from python 2.5 and isn't the new one on python 3.x so maybe just porting it there would make it better for threads.
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

Prior Posts by Topic

08/01/2009 - 09/01/2009   09/01/2009 - 10/01/2009   10/01/2009 - 11/01/2009   11/01/2009 - 12/01/2009   12/01/2009 - 01/01/2010   01/01/2010 - 02/01/2010   02/01/2010 - 03/01/2010   04/01/2010 - 05/01/2010   05/01/2010 - 06/01/2010   07/01/2010 - 08/01/2010   08/01/2010 - 09/01/2010   09/01/2010 - 10/01/2010   12/01/2010 - 01/01/2011   01/01/2011 - 02/01/2011   02/01/2011 - 03/01/2011   03/01/2011 - 04/01/2011   04/01/2011 - 05/01/2011   05/01/2011 - 06/01/2011   08/01/2011 - 09/01/2011   09/01/2011 - 10/01/2011   12/01/2011 - 01/01/2012   01/01/2012 - 02/01/2012   02/01/2012 - 03/01/2012   03/01/2012 - 04/01/2012   07/01/2012 - 08/01/2012   01/01/2013 - 02/01/2013   03/01/2013 - 04/01/2013   06/01/2014 - 07/01/2014   09/01/2014 - 10/01/2014  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]