tag:blogger.com,1999:blog-364566512024-03-05T02:32:59.389-08:00DabeazDave Beazley's mondo computer blog.Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comBlogger49125tag:blogger.com,1999:blog-36456651.post-89342443716949883172014-09-05T10:40:00.001-07:002014-09-05T10:40:59.638-07:00Python Courses for Fall 2014 with David BeazleyI'm pleased to announce that I'm offering two Python courses in Chicago this fall. If you're looking to escape work and have 5 days of intense Python immersion with a Python book author, this might be a good option.<br />
<br />
<h3><a href="http://www.dabeaz.com/chicago/index.html">Practical Python Programming, Sep. 29 - Oct. 3, 2014</a></h3><br />
If you've picked up a bit of Python from online tutorials or a workshop, this is a course that will take your skills to the next level. In this class, you'll learn more about using Python to perform various tasks related to data analysis and scripting of common system tasks. Core topics include the essential parts of the Python language (data structures, functions, modules, objects), useful modules in the standard library, and some major third-party packages such as NumPy, matplotlib, and Pandas. During the course, you'll apply what you have learned to some real-world projects involving the analysis of open government data.<br />
<br />
<h3><a href="http://www.dabeaz.com/chicago/index.html">Advanced Python Mastery, Oct. 20 - 24, 2014</a></h3><br />
The main focus of this course is to bridge the gap between using Python to write simple programs and using it to write larger applications, frameworks, and libraries. Virtually all major features of the Python language are covered in detail, but notable topics include advanced data structures, object oriented programming, functional programming, and metaprogramming. You will learn a lot about the software development techniques used by advanced Python libraries and how you might apply them to your own code. <br />
<br />
<h3>More Information</h3><br />
Both of these courses are taught in a round-table format that is strictly limited to <b>six</b> participants. Not only will you learn from someone who knows Python inside-out, you'll have meaningful interactions with the other participants who are just as enthusiastic about Python as you. More information is available at <a href="http://www.dabeaz.com/chicago/index.html">www.dabeaz.com/chicago/index.html</a>. Hopefully I'll see you in a course.<br />
<br />
-- Dave Beazley<br />
<br />
Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-8982939737834095302014-06-09T11:51:00.001-07:002014-06-09T11:51:58.577-07:00In Praise of Monument Valley (The Game)As a programmer and father of young boys, I have something to get off my chest--namely, most of the gaming industry, and especially that aimed at young children, leaves me in a furious rage. Maybe it's the time that a $35 in-app charge for a million popping bubbles showed up on my credit card (thanks <a href="http://tech.ca.msn.com/apple-slapped-with-class-action-lawsuit-over-in-app-purchases">Apple</a>). Or maybe it's that whenever I look at what the kids are "playing", they're usually just sitting around watching a timer to see how long they have to wait before a tanker hatches out of an egg, boards a boat, and travels across the ocean to the racetrack to put fuel in their racecar (or pay and race now!). Or the incessant ads, or requests for a Facebook login, or any number of other annoyances that pop up constantly. Who makes this crap?<br />
<br />
Well, I can tell you who makes it in the eyes of my kids--dad. Yes, I'm the one who "makes" the games through some kind of magic incantation. Frankly, I'm getting tired of hearing "dad, this game sucks." At this point, I'm pretty reluctant to install any game at all because I know that odds are it will be terrible and I'll be annoyed. However, enough of that.<br />
<br />
Over the weekend, I got tipped off to the game <a href="http://www.monumentvalleygame.com/">Monument Valley</a>. I'm so blown away that I'm motivated to write this brief post. In short, the game is visually stunning, mysterious, and engaging in every way that a game should be. In short: I love it and so do the kids.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbL6Dx9pES9HAPmDnoyDPU0ny5o8fQ1ev8E_RDmxHG-IWhvTaKII8nP61EKqLujOE2jFI2_tF995Oq0aOli_ef1QNbbdybRHo1-ZTouq5kLE5i6IiNzK77XeHdotOHdkhC0skl/s1600/photo-5.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbL6Dx9pES9HAPmDnoyDPU0ny5o8fQ1ev8E_RDmxHG-IWhvTaKII8nP61EKqLujOE2jFI2_tF995Oq0aOli_ef1QNbbdybRHo1-ZTouq5kLE5i6IiNzK77XeHdotOHdkhC0skl/s320/photo-5.jpg" /></a></div><br />
However, it's more than just that. This is a game that is devoid of ads, in-app purchases, waiting around, powerups, gambling, violence, or any other mainstay of modern "gaming." For that, I'd like to heap some praise on its maker <a href="http://ustwo.com/play/">Ustwo</a>. Thanks for making Monument Valley. More games like this please! I'll gladly pay. <br />
Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-1299373835221293942013-03-19T06:52:00.000-07:002013-03-19T06:52:51.619-07:00From the future, import recipes<p>As many of you know, Brian Jones and I have been hard at work on the Python Cookbook, 3rd edition. If you haven't been following us, you might not know that the book is actually finished and in final production. In fact, O'Reilly brought some bound galley copies that we signed and gave away at PyCon.<br />
</p><br />
<center><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqJhfC3YLzB7lZUTWa23ZMwyO9MNqXP_-J9aReceLvdZccherkcNI4blBmOY4J5LY2BYmpW52qBaUtYVYmwW5CxojDM5HbY2LMVEqzg_CQGzjk41m1oN1oZcGzUUmmuVzeqUgN/s1600/photo-5.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqJhfC3YLzB7lZUTWa23ZMwyO9MNqXP_-J9aReceLvdZccherkcNI4blBmOY4J5LY2BYmpW52qBaUtYVYmwW5CxojDM5HbY2LMVEqzg_CQGzjk41m1oN1oZcGzUUmmuVzeqUgN/s320/photo-5.jpg" /></a><br/><br />
<em>Galley Copy of the Cookbook</em><br />
</center><br />
<center><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ1PjjopzC08nCt-O168f8ryej3w0Zea64Vi95vLi2EFmS7umJy1fDavF3lDos1eNjYMX78L_QazZR1p11KEXMgJWsQ0iqWRAKOjm5Otk4rM-aCrLc4tjhmyQgDiFs_Avvw2_z/s1600/photo-4.jpg" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ1PjjopzC08nCt-O168f8ryej3w0Zea64Vi95vLi2EFmS7umJy1fDavF3lDos1eNjYMX78L_QazZR1p11KEXMgJWsQ0iqWRAKOjm5Otk4rM-aCrLc4tjhmyQgDiFs_Avvw2_z/s320/photo-4.jpg" /></a><br/><br />
<em>Book signing at PyCon</em><br />
</center><br />
<p>Readers familiar with past editions of the Cookbook might be inclined to think that the 3rd edition is simply an updated version of that material. However, the upcoming edition is a completely new book, written from the ground up to target Python 3.3. Rather than focusing on past techniques and working within the restrictions of backwards compatibility, this edition aims to solve various problems in the most modern manner possible. Thus, if you're thinking about moving to Python 3 or simply learning more about how it's different, this is the book you'll want. We think you'll like it.<br />
</p><br />
<p>Although the official release date for the book is in May, you can get the book in progress as an e-book in O'Reilly's <a href="http://shop.oreilly.com/product/0636920027072.do">Early Release</a> program. Also, if you keep a watchful eye, O'Reilly has been offering a 50% discount on the Cookbook in various promotions. For example, today (March 19), the cookbook is discounted in <a href="http://shop.oreilly.com/category/deals/best-of-oreilly-dotd.do?code=DEAL">this promotion</a>. An added benefit of the early release edition is that you get to submit errata for inclusion in the final book. <br />
</p><br />
<p>Last, but not least, if you're waiting for a print edition, look for it in the bookstore in late May. You can <a href="http://twitter.com/dabeaz">follow me on Twitter</a> for the latest updates.<br />
</p><br />
Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-25497254515669281452013-01-30T17:30:00.000-08:002013-01-30T17:30:26.693-08:00Build a Robot Army!So, I was recently at an event where some students from Chicago's Northside College Prep High School showed up to demo some of their robots and to talk about their upcoming participation in the upcoming <a href="http://www.usfirst.org/roboticsprograms/frc">FIRST Robotics Challenge</a>. For example, here's one of their past robots: <br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTKCdilWAoRDqF1aAND8RiMcYcl94C1iXGh43HhOQGoLs5ASr4xfoXbU60cjno80X28Mq2ilLclu3N08NZKaqbzDrKStouY58M7_JzeDlBTQKYde5kPTKhcBR1KnTw2RxFypv_/s1600/robotphoto.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="317" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTKCdilWAoRDqF1aAND8RiMcYcl94C1iXGh43HhOQGoLs5ASr4xfoXbU60cjno80X28Mq2ilLclu3N08NZKaqbzDrKStouY58M7_JzeDlBTQKYde5kPTKhcBR1KnTw2RxFypv_/s320/robotphoto.jpg" width="320" /></a></div><br />
As a kid, I did more than my fair share of things related to computer programming, but I definitely never built a robot. Now that I'm an adult though (sic), I can definitely see the advantages that a robot might offer. For instance, programming it to chase my 3 and 4 year boys around, keeping a menacing eye on them when in "time out" (think Cylons), and cleaning up after their messes. However, where would I even start with a diabolical project like that? I don't know anything about robots.<br />
<br />
As I've learned, sending a team to a robotics competition is no cheap affair. This is especially so if you're at a public school and you want to equip your basic entry-level robot with all sorts of cool accessories such as laser beams, plasma torches, x-ray vision and stuff. And don't even talk about travel. No, seriously, these students are probably all going to be huddled in a van using the robot's plasma torch just to stay warm. On a serious note, this is actually the very first year that Northside has participated in the FIRST Robotics Challenge. As a rookie team, time and resources aren't always easy to come by. <br />
<br />
Sensing an opportunity, I've decided to help solve both problems by sponsoring a <a href="http://robotarmy.eventbrite.com/">Build a Robot Army</a> event on February 9, 2013 at my office in Chicago. The Northside students are going to stop by with some robot kits and teach everyone the basics of building a robot. It's limited to only 8 people. As such, it will be hands-on and in-depth. All of the proceeds will go to help the Northside team. In short, it will be an awesomely fun way to spend a Saturday afternoon.<br />
<br />
More information is available at <a href="http://robotarmy.eventbrite.com/">http://robotarmy.eventbrite.com</a>. I hope to see you there! -Dave<br />
<br />
<br />
<br />
<br />
<br />
Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-32046504993535631762012-07-04T14:15:00.000-07:002012-07-04T14:16:08.742-07:00If you remove the GIL, will it leave a GIL-shaped hole?<p>So, I recently assembled a <a href="http://www.shapeoko.com">Shapeoko</a> CNC machine and was deciding what do with it for a first test run. Naturally, writing a Python 3 script to literally remove the "GIL" from a board came to mind.<br />
</p><br />
<center><br />
<img src="http://www.dabeaz.com/images/GILBoard.jpg"><br />
</center><br />
<br />
<p>Here is the video of it being milled:<br />
</p><br />
<center><br />
<iframe width="560" height="315" src="http://www.youtube.com/embed/SNBKWuM-Lu8" frameborder="0" allowfullscreen></iframe><br />
</center><br />
<br />
<p>Yep, removing GILs with Python 3 and power tools---all in a days work. That's all for now.<br />
</p><br />
<br />Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-55923665108768034182012-03-13T09:07:00.000-07:002012-03-13T09:07:36.408-07:00PyCon 2012 Followup<p>Well, PyCon 2012 has come and gone. It was fantastic to see everyone and to learn new things. For me, I think the big takeaway from the conference was all of the activity surrounding scientific computing and data analysis. The <a href="http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html">ipython notebook</a> project is really quite amazing--I will need to spend more time checking that out. As a former stats geek, I'm also quite interested in checking out <a href="http://pandas.pydata.org/">pandas</a>. Last, but not least, I really enjoyed chairing the session on GIS tools. I have taken a recent interest in GIS and mapping, but quickly realized that I was way out of my element with it. The talks were a nice way to connect with some of the tools and techniques involved.</p>
<h3>The 5.72 missing slides from my keynote</h3>
<p>
In my <a href="http://www.youtube.com/watch?v=l_HBRhcgeuQ">keynote talk on PyPy</a>, I somehow managed to cover 123 presentation slides with 2 minutes to spare. Vern Ceder then noted that I now owe everyone an extra 5.72 slides. So, without further delay, here are the 5.72 extra things I wish I could have included in my PyPy Keynote:
</p>
<ol>
<li>Even the Mandelbrot set code that I presented doesn't do PyPy's performance justice. I have been experimenting with some other libraries including a graphics library and a program for molecular dynamics. On these programs, I've seen speedups of almost 50x going from CPython to PyPy. It's pretty amazing. </li>
<p>
<li>Don't panic about the amount of source code! Even though there are about 1.25 million lines of Python code in the full PyPy distribution, a huge chunk of that is the Python standard library (maybe 500-600K lines). There are also a lot of unit tests, supporting tools, and other things. Thus, in terms of actual code, the size of PyPy itself is not that big. The main point is that if you just dive into PyPy without knowing much about it though, you will see an awful lot of Python code sitting around. </li></p>
<p>
<li>Don't let the documentation and papers intimidate you. I have found that over time, with repeated readings and messing around, the material all starts to come together in a more coherent way. It's just that when you first start, it can all be a bit overwhelming.
</li></p>
<p>
<li>RPython gives you much more than just translation to C. You also get important things such as garbage collection and useful containers. Actually, one of the main benefits of RPython is that you don't have to worry about such low-level details as memory allocation, pointers, and other matters commonly associated with C programming. These extra features are why even translating a simple "Hello World" program includes quite a bit of supporting C code.
</li></p>
<p>
<li>It's also important to strongly emphasize that RPython is simply the implementation language for PyPy. That is, PyPy is a Python interpreter that just so happens to be written in RPython, much like CPython is a Python interpreter written in C. It's important to note that RPython can be used all on its own to implement new programs and interpreters for other unrelated languages.
</li></p>
<p>
<li value="5"> (.72) I have been working on a simple GIL removal patch, but...
</li>
</ol>
<h3>If you've been using my past GIL work to troll, please stop it already</h3>
<p>
In Guido's keynote talk, he talked about feeding trolls and at one point mentioned the GIL. Having done a lot of past work trying to understand the GIL, I'd like to say a few words about that. First of all, if you think that my primary interest in the GIL is to show how Python threads (or Python itself) suck, then you would be wrong. You will not find either claim in any of that work nor will you find me online engaging in discussion about such things.</p>
<p>
The truth of the matter is that threads are my preferred means of concurrent programming in Python. Not only that, they work extremely well for all sorts of problems involving I/O. However, if you're going to program with threads, it's important to be aware of situations where they do not work as well. In the case of Python, this mostly pertains to programs that try to subdivide computation across threads. There are also potential issues with code that overlaps heavy computation with background I/O handling. Thus, my main interest in the GIL has been to explore some of these corner cases in some detail with an eye towards improving the GIL implementation--something that I consider to be a worthy goal. More generally, I think the GIL is interesting to study as a systems programming puzzle on its own. </p>
<p>
In any case, the performance of threads is highly specific to the application at hand. You can't just take some benchmark from one of my GIL talks and extrapolate that out to a general statement about all Python thread programming. Personally, I find that Python threads have worked pretty well for most of the problems where I've used them. Of course your mileage might vary.
</p>
<h3>Angry Birds</h3>
<p>Last, but not least, I never should have gone to Jason Huggins's talk about the <a href="http://pyvideo.org/video/683/building-a-robot-that-can-play-angry-birds-on-a-s">robot that plays Angry Birds</a>. I have now become that robot when I should be working on my concurrency workshop. Argh!
</p>
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-41045203747541378072012-02-21T04:39:00.000-08:002012-02-21T04:39:34.860-08:00Python Concurrency Workshop (March 19-22, 2012)<p>So, you couldn't get into PyCon? Or, PyCon isn't enough and you need <em>even more</em> Python than that? Or, you simply want to escape all of your coworkers for a fun week of intense coding? Then, you might want to know that my <a href="http://www.dabeaz.com/chicago/concurrent.html">Python Concurrency Workshop</a> is making a return run March 19-22, 2012 in Chicago.</p>
<p>
The concurrency workshop is an intense course that focuses on core topics in concurrent programming and distributed systems. You'll learn how to effectively use things like threads, processes, asynchronous I/O, message passing, coroutines and more. Plus, you'll gain a much deeper awareness of important ideas that will help you understand the features and tradeoffs associated with different concurrency approaches, libraries, and programming languages. If you must know, this is also the workshop that spawned much of my work on the <a href="http://www.dabeaz.com/GIL">Python GIL</a>.</p>
<p>
More information can be found at <a href="http://www.dabeaz.com/chicago/concurrent.html">http://www.dabeaz.com/chicago/concurrent.html</a>. Hopefully, I'll see you in Chicago!</p>
-- DaveDave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-2600023206688032982012-02-02T06:50:00.000-08:002012-02-02T06:50:34.903-08:00Understanding RPython<p>Lately, I've been trying to wrap my brains around how the PyPy translation toolchain works--in preparation for my PyCon plenary talk. I'd planned to do some blogging about it, but have become suddenly inundated with work. So, in it's place, I present a screencast of the January 12, 2012 <a href="http://chipy.org">Chipy</a> talk I gave about it. If you're like me, and have wondered what PyPy is doing under the covers, you might find it interesting. Enjoy!</p>
<iframe width="420" height="315" src="http://www.youtube.com/embed/GjnRLG8ATn4" frameborder="0" allowfullscreen></iframe>
<P>
I hope to say even more at PyCon. See you in a month!
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-11785968376257205132012-01-31T05:10:00.000-08:002012-02-01T04:04:51.131-08:00Drunk Tweeting in Chicago<p>
Lately, I've been messing around with the <a href="http://pypi.python.org/pypi/requests">requests</a> and <a href="http://pypi.python.org/pypi/regex">regex</a> libraries for Python. They are awesome. So, without any further explanation, I present this short script that uses both in an attempt to identify people drunk-tweeting in Chicago. Enjoy.</p>
<blockquote>
<pre>
# drunktweet.py
'''
Print out possible drunk tweets from the city of Chicago.
'''
import regex
import requests
import json
# Terms for being "wasted"
terms = { 'drunk','wasted','buzzed','hammered','plastered' }
# A fuzzy regex for people who can't type
pat = regex.compile(r"(?:\L<terms>){i,d,s,e<=2}$", regex.I, terms=terms)
# Connect to the Twitter streaming API
url = "https://stream.twitter.com/1/statuses/filter.json"
parms = {
'locations' : "-87.94,41.644,-87.523,42.023" # Chicago
}
auth = ('username','password')
r = requests.post(url, data=parms, auth=auth)
# Print possible candidates
for line in r.iter_lines():
if line:
tweet = json.loads(line)
status = tweet.get('text',u'')
words = status.split()
if any(pat.match(word) for word in words):
print(tweet['user']['screen_name'], status)
</pre>
</blockquote>
<p>
It's left as an exercise to reader to filter out false-positives and have the script call a cab. By the way, you should check out some of my <a href="http://www.dabeaz.com/chicago/index.html">Python Classes in Chicago</a>.</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-42283759778084552792012-01-03T05:22:00.000-08:002012-01-12T06:52:52.459-08:00The Compiler Experiment Begins<p><b>January 13, 2012 Update:</b> There are still a few seats left in the compilers class for January 17-20, 2012. More details <a href="http://www.dabeaz.com/chicago/compiler.html">here</a>.</p>
<p>
In the spring of 1995, I took a course on compiler design. At the time, I was just a first year Ph.D. Computer Science student making my way through various course requirements. Before that, I was a mathematician working on computational physics software--writing a lot of finely tuned C code for solving differential equations on big supercomputers. Although I already considered myself to be a pretty knowledgable programmer, I think compilers was probably the one course that had the most profound impact on my later work. In fact, this is the course that inspired me to look at the use of scripting languages for controlling scientific software. It also directly led to the <a href="http://www.swig.org">Swig</a> project, first implemented in the summer of 1995. Last, but not least, this is how I ultimately ended up in the world of Python.</p>
<p>
I think the great thing about compilers was how it simply tied so many topics together all in one place. Everything from mathematical theory, clever algorithms, programming language semantics, computer architecture, software design, clever hacking, and even the nature of computation itself. As part of that course, we had to write our own compiler--a tangled mess of C code that turned a subset of Pascal into executable code that would actually run on a Sun Sparcstation. To be sure, the code was a horrible disaster. However, simply having written a working compiler was definitely one of the most memorable parts of graduate school.</p>
<p>
In 2001, I had an opportunity to revisit the topic of compilers. At the time, I was an assistant professor at the University of Chicago and an opportunity to teach compilers came up. I jumped at it. I also used the opportunity to try an experiment of what it might be like to write a compiler in Python instead of C. As a bit of context, a lot of people had been asking me about the idea of rewriting Swig in Python (instead of C++). I wasn't so sure. In fact, I really didn't even know how to do it given doubts about Python's performance as well as a general lack of sufficiently powerful parsing tools at the time. Long story short--this is how the <a href="http://www.dabeaz.com/ply/index.html">PLY</a> project came into existence. I used it in the class and had about 25 students write a compiler for an even more powerful subset of Pascal, creating runnable code for the Sparc.</p>
<p>
Fast forward 11 years. I've long since left the University, but I still continue to teach quite a few classes--especially various sorts of Python classes. Over the past year or so, students and I have often discussed the idea of having some kind of advanced project course. Something that would be quite a bit harder and involve much more coding. I think you might see where this is going.</p>
<h3>Write a Compiler (in Python)</h3>
<p>
So, today is the first day of another compiler experiment. Over the course of 4 days, I'm going to attempt to take six students through a compiler writing project similar to the one at the University. It's basically a nine-stage project:
</p>
<ol>
<li>Lexing and tokenizing.</li>
<li>Parsing and parse trees.</li>
<li>Type checking.</li>
<li>Intermediate code generation.</li>
<li>Simple optimization (constant folding, etc.).</li>
<li>Relations</li>
<li>Control flow</li>
<li>Functions</li>
<li>Output code in RPython (from the PyPy project)</li>
</ol>
<p>
One interesting thing about using Python for such a project is that you can use the internals of Python itself to explore important concepts. For example, if you want to see what happens when you compile a regular expression, you can just try it:
</p>
<blockquote>
<pre>
>>> <b>import sre_parse</b>
>>> <b>sre_parse.parse(r"[a-zA-Z_][a-zA-Z0-9_]*")</b>
[('in', [('range', (97, 122)), ('range', (65, 90)),
('literal', 95)]), ('max_repeat', (0, 65535, [('in',
[('range', (97, 122)), ('range', (65, 90)),
('range', (48, 57)), ('literal', 95)])]))]
>>>
</pre>
</blockquote>
<p>
Or, if you want to look at how Python makes an AST:
</p>
<blockquote>
<pre>
>>> <b>import ast </b>
>>> <b>node = ast.parse("a = x + 2*y")</b>
>>> <b>ast.dump(node)</b>
"Module(body=[Assign(targets=[Name(id='a', ctx=Store())], value=BinOp(left=Name(id='x', ctx=Load()), op=Add(), right=BinOp(left=Num(n=2), op=Mult(), right=Name(id='y', ctx=Load()))))])"
>>>
</pre>
</blockquote>
<p>
Or, if you want to see what kind of code Python generates:
</p>
<blockquote>
<pre>
>>> <b>def fact(n):
... if n <= 1:
... return 1
... else:
... return n*fact(n-1)</b>
...
>>> <b>import dis</b>
>>> <b>dis.dis(fact)</b>
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (1)
6 COMPARE_OP 1 (<=)
9 POP_JUMP_IF_FALSE 16
3 12 LOAD_CONST 1 (1)
15 RETURN_VALUE
5 >> 16 LOAD_FAST 0 (n)
19 LOAD_GLOBAL 0 (fact)
22 LOAD_FAST 0 (n)
25 LOAD_CONST 1 (1)
28 BINARY_SUBTRACT
29 CALL_FUNCTION 1
32 BINARY_MULTIPLY
33 RETURN_VALUE
34 LOAD_CONST 0 (None)
37 RETURN_VALUE
>>>
</pre>
</blockquote>
<p>
By looking at what Python does itself, I think it can be related back the work the students will be doing on their own project and might be an interesting way to explore important concepts without getting completely bogged down in a theory-heavy exposition (as one might find in a compilers textbook). I don't have any grand illusions about the students running off afterwards to do research in compilers. However, I think it will be an interesting experiment where everyone still learns a lot.</p>
<h3>Follow the Project</h3>
<p>
Due to time constraints of the project, I won't be blogging during the week. However, you can <a href="http://www.twitter.com/dabeaz">follow me on Twitter</a> for updates to see how it's going. I will be posting a more detailed followup describing the project and how it worked out after it's over.
</p>
<p>
If you would like to write a compiler yourself, there are still some seats left in a second running of the project, January 17-20. Click <a href="http://www.dabeaz.com/chicago/compiler.html">here</a> for more information.
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-52202159515656763182011-12-21T20:13:00.000-08:002011-12-28T04:05:01.754-08:00Python Courses for 2012<p>
I'm excited to announce my new <a href="http://www.dabeaz.com/chicago/index.html">Python training courses</a> for the first part 2012. These are intense hands-on classes that are strictly limited to 6 attendees. Unlike an online course, you'll get to escape work, your family, and friends for several days while you become completely immersed in the topic at hand. Needless to say, you won't be disappointed.
</p>
<center>
<img src="http://www.dabeaz.com/chicago/class_small.jpg"><br>
<em>A Python course in action</em>
</center>
<p>
<b><a href="http://www.dabeaz.com/chicago/compiler.html">Write a Compiler (In Python) : January 17-20, 2012</a></b>
<blockquote>
So you never got to take a compilers course in college or you're simply a masochist looking to take your programming skills up a few levels? Then this is the course for you. Come to Chicago in January and spend 4 days writing a compiler for your own programming language and have it run on top of rpython, the implementation language used by PyPy. In this course, you'll learn about all of the major parts of what makes a compiler work, see a bunch of advanced Python programming techniques, and dive into all sorts of low-level black magic. Why? Because it's fun. Update: seats are still available!
</blockquote>
</p>
<p>
<b><a href="http://www.dabeaz.com/chicago/science.html">Python for Scientists and Engineers : February 27-March 2, 2012</a></b>
<blockquote>
Join special guest Mike Müller, founder of <a href="http://www.python-academy.com/">Python Academy</a> for an exclusive 5-day in-depth course on using Python for Science and Engineering. Topics include numerical computing with numpy/scipy, plotting, working with large data files, Python extensions, testing, version control, and more. I'm pleased to have Mike join me in Chicago for this special course before he heads to PyCon'2012.
</blockquote>
</p>
<p>
<b><a href="http://www.dabeaz.com/chicago/concurrent.html">Python Concurrency and Distributed Computing Workshop : March 19-22, 2012</a></b>
<blockquote>
A 4-day in-depth exploration of everything you could possibly want to know about concurrency and distributed computing in Python. Major topics include threads, multiprocessing, event-driven I/O (async), message passing, and coroutines. If you must know, this is the same workshop that spawned my <a href="http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-understanding-the-python-gil-82-3273690">infamous talks on the Python GIL</a>. Past participants have described this workshop as combining the contents of about three different university systems courses crammed into the span of four days.
</blockquote>
</p>
<p>
<b><a href="http://www.dabeaz.com/chicago/mastery.html">Advanced Python Mastery : April 2-6, 2012</a></b>
<blockquote>
Go far beyond the basic tutorial and learn the secret Python programming techniques used by the authors of frameworks and libraries. Topics include some of Python's most advanced features including object implementation, descriptors, decorators, metaclasses, packaging, optimization, and more. Simply stated, the aim of this course is to cover the entire Python programming language, leaving no stone unturned.
</blockquote>
</p>
<p>
<b><a href="http://www.dabeaz.com/chicago/practical.html">Practical Python Programming : May 14-18, 2012</a></b>
<blockquote>
A thorough introduction to Python for software developers, scientists, and engineers who already know how to program in another programming language. A major focus of this course is on using Python to process various kinds of datasets, especially those associated with systems scripting and open data sources. In addition to Python, this course includes a basic introduction to some of the scientific tools including numpy and matplotlib. If you're looking to improve your Python programming skills after complete a more basic tutorial, this is a great course to take.
</blockquote>
<p>
2012 marks the start of my fourth year of offering Python courses to small groups in Chicago. One of the best parts of these classes is the interaction that results from putting programmers with different backgrounds together in the same room. Everyone who attends wants to be there and conversations are likely to cover just about any topic imaginable (not just computers). I learn new things with every class and think it's a lot of fun. Hopefully I'll see you in a future class!
</p>
<p>
Cheers,
Dave
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-43367923604976313972011-09-18T20:27:00.000-07:002011-09-18T20:27:28.657-07:00Three Python Courses for Fall<p>
As the leaves start to turn, I'm finally pleased to announce the dates for my fall Python courses in Chicago. </p>
<p>
<ul>
<li>
<b><a href="http://www.dabeaz.com/chicago/concurrent.html">Python Concurrency and Distributed Computing Workshop (Nov 1-4).</a></b>
<p>The concurrency workshop is back, but is now expanded to four full days. Since its start in 2009, the concurrency workshop has been my favorite place to try out new material and explore some really cutting edge Python topics. This is the same workshop that spawned the infamous <a href="http://blip.tv/carlfk/mindblowing-python-gil-2243379">Mindblowing GIL</a> talk and subsequent <a href="http://www.dabeaz.com/GIL">GIL presentation</a> at PyCon. More recent editions of the workshop have expanded to include Python 3, messaging frameworks such as ZeroMQ, and the use of NoSQL databases. Past participants have described the workshop as covering about the same amount of material as three college courses. If you like geeking out with other programmers and learning new things, you'll have a great time.</p>
</li>
<P>
<li>
<b><a href="http://www.dabeaz.com/chicago/mastery.html">Advanced Python Mastery (Nov 14-18).</a></b>
<p>
If you just want to learn Python basics, you can find about a million free on-line tutorials and videos to get you going. However, if you want to understand all of the deep Python magic used by various application frameworks, then this is the class for you. This is one of the only truly advanced training courses around that deeply explores the internals of Python's built-in objects, underlying object model, functions, and metaprogramming features. Topics include most of Python's advanced features including cooperative inheritance (super, mixins, etc.), descriptors, decorators, context managers, metaclasses, generators, coroutines, packages, and more. Even seasoned Python programmers will learn new tricks.
</p>
</li>
<p>
<li><b><a href="http://www.dabeaz.com/chicago/practical.html">Practical Python Programming (Dec. 12-16).</a></b>
<P>
If you're new to Python and want to learn more in the company of other enthusiastic programmers, then this is the class for you. A major theme of this course is on using Python to analyze data. Over the course of the week, you'll learn how to use Python to analyze datafiles, extract information from public web services (REST APIs), use popular extensions such as numpy and matplotlib. Most of the exercises in this course involve open data published by various government and city sources. So, not only will you get to learn Python, you'll get to do all sorts of neat stuff such as analyze crime data, locate huge rats, make maps, and more. It should be great fun.
</p>
</li>
</ul>
<p>
More information about these courses can be found <a href="http://www.dabeaz.com/chicago/index.html">here</a>. In the meantime, be sure to catch my talks at <a href="http://py.codeconf.com">PyCodeConf</a>, Oct 6-7, in Miami and at <a href="http://rupy.eu">RuPy 2011</a>, Oct 14-16, in Poznan, Poland.</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-47315445152099264062011-08-12T15:19:00.000-07:002011-08-16T20:09:00.432-07:00An Inside Look at the GIL Removal Patch of Lore<p>
As most Python programmers know, people love to hate
the Global Interpreter Lock (GIL). Why can't it
simply be removed? What's the problem? However, if you've been around the
Python community long enough, you might also know that the
GIL was already removed once before--specifically, by Greg Stein who created
a patch against Python 1.4 in 1996. In fact, here's a link:</p>
<p>
<ul>
<li><a
href="http://www.python.org/ftp/python/contrib-09-Dec-1999/System/threading.tar.gz">http://www.python.org/ftp/python/contrib-09-Dec-1999/System/threading.tar.gz</a>
</li>
</ul>
<p>
This patch often gets mentioned in discussions regarding the
GIL--especially as justification for keeping the GIL. For
example, see the <a
href="http://docs.python.org/faq/library#can-t-we-get-rid-of-the-global-interpreter-lock">Python
FAQ</a>, the forum post <a
href="http://www.artima.com/forums/flat.jsp?forum=106&thread=214235&start=0&msRange=15">It
Isn't Easy to Remove the GIL</a> or this mailing discussion on
<a
href="http://mail.python.org/pipermail/python-dev/2001-August/017099.html">Free
Threading</a>. These discussions usually point out that the patch
made the performance of single-threaded applications much worse--so
much so that the patch couldn't be adopted. However, beyond that,
technical details about the patch are somewhat sketchy.</p>
<p>
Despite using Python since early 1996, I will
freely admit that I never really knew the details of Greg's GIL
removal patch. For the most part, I just vaguely knew that someone
had attempted to remove the GIL, that it apparently killed the performance
of single-threaded apps, and that it subsequently faded into
oblivion. However, given my recent interest in making the GIL better, I
thought it might be a fun challenge to turn on the time machine,
see if I could actually find the patch, and compile a version of GIL-less Python in order to take a peek
under the covers.</p>
<p>
So, in this post, I'll do just that and try to offer some commentary on some of
the more interesting and subtle aspects of the patch--in particular,
aspects of it that seem especially tricky or problematic.
Given the increased interest in concurrency, the GIL, and other matters, I hope
that this information might be of some use to others, or at the very
least, help explain why Python still has a GIL. Plus, as the saying
goes, those who don't study the past are doomed to repeat it. So,
let's jump in.</p>
<h3>Python 1.4</h3>
<p>
In order to play with the patch, you must first download and build
Python-1.4. You can find it on <a href="http://www.python.org/download/releases/src/">python.org</a> if
you look long enough. After some bit of Makefile twiddling, I was
able to build it and try a few things out on a Linux system.</p>
<p>
Using Python-1.4 is a big reminder of how much Python has
changed. It has none of the nice features you're used to to using now (list
comprehensions, sets, string methods, <tt>sum()</tt>,
<tt>enumerate()</tt>, etc.). In playing with it, I realize that about
half of everything I type results in some kind of error. I
digress.</p>
<p> Thread support in Python-1.4 is equally minimal. The
<a href="http://docs.python.org/library/threading.html"><tt>threading.py</tt></a> module that most people know doesn't yet
exist. Instead, there is just the lower-level <a href="http://docs.python.org/library/thread.html"><tt>thread</tt></a> module
which simply lets you launch a thread, allocate a mutex lock, and not much
else. Here's a small sample:</p>
<blockquote>
<pre>
import thread
import time
def countdown(n):
while n > 0:
print "T-minus", n
n = n - 1
time.sleep(1)
thread.start_new_thread(countdown,(10,))
</pre>
</blockquote>
<p>
Under the covers though, the implementation is
remarkably similar to modern Python. Each thread consists of a C
function that runs the specified Python callable and there is a GIL that
guards access to some critical global interpreter state. </p>
<h3>A Reentrant Interpreter?</h3>
<p>If you read the <tt>threading.README</tt> file included in the patch,
you will find this description:</p>
<p>
<blockquote>
<em>These patches enable Python to be "free threaded" or, in other words,
fully reentrant across multiple threads. This is particularly important
when Python is embedded within a C program.</em>
</blockquote>
</p>
<p>
The stated goal of making Python "fully reentrant" is important so
let's discuss.</p>
<p>
All Python code gets compiled down to an intermediate "machine
langauge" before it executes. For example, consider a simple function
like this:
</p>
<blockquote>
<pre>
def countdown(n):
while n > 0:
print "T-minus", n
n -= 1
time.sleep(1)
print "Blastoff!"
</pre>
</blockquote>
<p>
You can view the underlying low-level instructions using the <a
href="">dis</a> module.</p>
<blockquote>
<pre>
>>> <b>import dis</b>
>>> <b>dis.dis(countdown)</b>
2 0 SETUP_LOOP 48 (to 51)
>> 3 LOAD_FAST 0 (n)
6 LOAD_CONST 1 (0)
9 COMPARE_OP 4 (>)
12 POP_JUMP_IF_FALSE 50
3 15 LOAD_CONST 2 ('T-minus')
18 PRINT_ITEM
19 LOAD_FAST 0 (n)
22 PRINT_ITEM
23 PRINT_NEWLINE
4 24 LOAD_FAST 0 (n)
27 LOAD_CONST 3 (1)
30 INPLACE_SUBTRACT
31 STORE_FAST 0 (n)
5 34 LOAD_GLOBAL 0 (time)
37 LOAD_ATTR 1 (sleep)
40 LOAD_CONST 3 (1)
43 CALL_FUNCTION 1
46 POP_TOP
47 JUMP_ABSOLUTE 3
>> 50 POP_BLOCK
6 >> 51 LOAD_CONST 4 ('Blastoff!')
54 PRINT_ITEM
55 PRINT_NEWLINE
56 LOAD_CONST 0 (None)
59 RETURN_VALUE
>>>
</pre>
</blockquote>
<p>
Now, here's the critical bit--as a general rule, most low-level
interpreter instructions tend to execute atomically. That is, while
executing an instruction, the GIL is held, and no preemption is
possible until completion.</p>
<p> For a large majority of interpreter instructions, the lack of
preemption is rarely a concern because they execute almost
immediately. However, every now and then, a program may execute a
long-running operation. Typically this happens if an operation is
performed on a huge amount of data or if Python calls out to
long-running C code that doesn't release the GIL.</p>
<p>
Here is a simple example you can try to see it. Launch the
above countdown function in its own thread like this (using the legacy
thread module).</p>
<blockquote>
<pre>
>>> <b>import thread</b>
>>> <b>thread.start_new_thread(countdown,(20,))</b>
T-minus 20
T-minus 19
...
>>>
</pre>
</blockquote>
<p>
Now, while that is running, do this:
</p>
<blockquote>
<pre>
>>> <b>max(xrange(1000000000))</b>
</pre>
</blockquote>
<p>
If you do this, everything should just grind to a halt. You will see
no output from the countdown thread at all and Ctrl-C will be frozen.
In a nutshell, this is the issue with preemption (or lack thereof).
Due to the GIL, it's possible for one thread to temporarily block the progress of
all other threads on the system.</p>
<p>
The lack of preemption together with the GIL presents certain challenges when Python is
embedded into a multithreaded C program. In such applications, the
Python interpreter might be used for high-level scripting, but also for
things such as event handling and callbacks. For example, maybe you have some C
thread that's part of a game or visualization package that's calling
into the Python interpreter to trigger event-handler methods in real
time. In such code, control flow might pass from Python, to C, back
to Python, and so forth.</p>
<p> Needless to say, it's possible that in such an environment, you might
have a collection of C or C++ threads that compete for use of the
Python interpreter and are forced to synchronize on the GIL. This means that the interpreter might
become a bottleneck of the whole system. If, somehow, you could get
rid of the GIL, then any thread would be allowed to use the
interpreter without worrying about other threads. For example, a C++
program triggering a Python event callback, wouldn't have to concern
itself with other Python threads---the callback would simply run
without being blocked. This is what you get by making the
interpreter fully reentrant.
</p>
<p>
It is in this context of embedding that the GIL removal patch should
probably be viewed. At the time it was created,
significantly more Python users were involved with integrating Python
with C/C++ applications. In my own area of
scientific computing, people were using Python to build interactive
data visualization tools which often involved heavy amounts
of CPU computation. I knew others who were tring to use
Python for internal scripting of commercial PC video games. For all of
these groups, removal of the GIL (if possible) was viewed as desirable
because doing so would simplify programming and improve the user-experience (better
responsiveness of the GUI, fewer stalls, etc.). If you've ever been
sitting there staring at the spinning beachball on your Mac wishing it
would just go away, well, then you get
the general idea.</p>
<h3>Exploring the Patch</h3>
<p>
If you download the free-threading patch, you will find that it is a
relatively small set of files that replace and add functionality to Python-1.4. Here is
a complete list of the modified files included in the patch:</p>
<p>
<blockquote>
<pre>
./Include:
listobject.h pymutex.h sysmodule.h
object.h pypooledlock.h threadstate.h
./Modules:
signalmodule.c threadmodule.c
./Objects:
complexobject.c intobject.c longobject.c stringobject.c
frameobject.c listobject.c mappingobject.c tupleobject.c
./Python:
Makefile.in importdl.c pythonrun.c traceback.c
ceval.c pymutex.c sysmodule.c
errors.c pypooledlock.c threadstate.c
</pre>
</blockquote>
<p>
As you can see, it's certainly not a rewrite of the entire
interpreter. In fact, if you run a diff across the Python-1.4 source
and the patch, you'll find that the changes amount to about 1000 lines
of code (in contrast, the complete source code to Python-1.4 is about
82000 lines as measured by 'wc').</p>
<p>I won't go into the details of applying or compiling the patch
except to say that detailed instructions are included in the README should you want
to build it yourself.</p>
<h3>Initial Impressions</h3>
<p>
With the patch applied, I tried to do a few rough performance tests (note: I
ran these under Ubuntu 8.10 on a dual-core VMWare Fusion instance
running on my Mac). First, let's just write a simple spin-loop and see what happens:</p>
<blockquote>
<pre>
import time
def countdown(n):
while n > 0:
n = n - 1
start = time.time()
countdown(10000000)
end = time.time()
print end-start
</pre>
</blockquote>
<p>
Using the original version of Python-1.4 (with the GIL), this code runs
in about 1.9 seconds. Using the patched GIL-less version, it runs in
about 12.7 seconds. That's about 6.7 times slower. Yow!</p>
<p>
Just to further confirm that finding, I ran the included
<tt>Tools/scripts/pystone.py</tt> benchmark (modified to run slightly
longer in order to get more accurate timings). First, with the GIL:</p>
<blockquote>
<pre>
$ python1.4g Tools/scripts/pystone.py
Pystone(1.0) time for 100000 passes = 3.09
This machine benchmarks at 32362.5 pystones/second
</pre>
</blockquote>
<p>Now, without the GIL:</p>
<blockquote>
<pre>
$ python1.4ng Tools/scripts/pystone.py
Pystone(1.0) time for 100000 passes = 12.73
This machine benchmarks at 7855.46 pystones/second
</pre>
</blockquote>
<p>Here, the GIL-less Python is only about 4 times slower. Now, I'm just
slightly more impressed.</p>
<p>
To test threads, I wrote a small sample that subdivided the work
across two worker threads is an embarrassingly parallel manner (note: this
code is a little wonky due to the fact that Python-1.4 doesn't
implement thread joining--meaning that you have to do it yourself with
the included binary-semaphore lock).
</P>
<blockquote>
<pre>
import thread
import time
import sys
sys.setcheckinterval(1000)
def countdown(n,lck):
while n > 0:
n = n - 1
lck.release() # Signal termination
lck_1 = thread.allocate_lock()
lck_2 = thread.allocate_lock()
lck_1.acquire()
lck_2.acquire()
start = time.time()
thread.start_new_thread(countdown,(5000000,lck_1))
thread.start_new_thread(countdown,(5000000,lck_2))
lck_1.acquire() # Wait for termination
lck_2.acquire()
end = time.time()
print end-start
</pre>
</blockquote>
<p>If you run this code with the GIL, the execution time is about
2.5 seconds or approximately 1.3 times slower than the single-threaded
version (1.9 seconds). Using the GIL-less Python, the execution time is
18.5 seconds or approximately 1.45 times slower than the
single-threaded version (12.7 seconds). Just to emphasize, the GIL-less Python
running with two-threads is running more than 7 times slower than the
version with a GIL.</p>
<p>
Ah, but what about preemption you ask? If you return to the example
above in the section about reentrancy, you will find that removing the
GIL, does indeed, allow free threading and long-running calculations
to be preempted. Success!</p>
<p>
Needless to say, there might be a few reasons why the patch quietly
disappeared.</p>
<h3>Under the Covers</h3>
<p>Okay, the performance is terrible, but what is actually going on
inside? Are there any lessons to be learned? A look at the source
code and related documentation reveals all.</p>
<h3>Capturing Thread State</h3>
<p>
For free-threading to work, each thread has to isolate its
interpreter state and not rely on C global variables. The threading patch does this by defining a new
data structure such as the following:</p>
<blockquote>
<pre>
/* Include/threadstate.h */
typedef struct PyThreadState_s
{
PyFrameObject * current_frame; /* ceval.c */
int recursion_depth; /* ceval.c */
int interp_ticker; /* ceval.c */
int tracing; /* ceval.c */
PyObject * sys_profilefunc; /* sysmodule.c */
PyObject * sys_tracefunc; /* sysmodule.c */
int sys_checkinterval; /* sysmodule.c */
PyObject * last_exception; /* errors.c */
PyObject * last_exc_val; /* errors.c */
PyObject * last_traceback; /* traceback.c */
PyObject * sort_comparefunc; /* listobject.c */
char work_buf[120]; /* <anywhere> */
int c_error; /* complexobject.c */
} PyThreadState;
</pre>
</blockquote>
<p>
Essentially, all global variables in the interpreter have been picked
up and moved into a per-thread data structure. Some of these values are obvious
candidates such as exception information, tracing, and profiling
hooks. Other parts are semi-random. For example, there is storage
for the compare callback function used by list sorting and a global
error handling variable (c_error) used to propagate errors across
internal functions in the implementation of complex numbers.</p>
<p>
To manage multiple threads, the interpreter builds a linked-list of
all active threads. This linked list contains the thread-identifier
along with the corresponding <tt>PyThreadState</tt>
structure. For example:</p>
<blockquote>
<pre>
/* Python/threadstate.c */
...
typedef struct PyThreadStateLL_s
{
long thread_id;
struct PyThreadStateLL_s * next;
PyThreadState state;
} PyThreadStateLL;
</pre>
</blockquote>
<p>
Whenever a thread wants to get its per-state state information, it
simply calls a function <tt>PyThreadState_Get()</tt>. This function
scans the linked-list searching for the caller's thread-identifier.
When found, the matching thread state structure is moved to the front
of the linked list and the value returned. Here is a short example of
code that illustrates an example use with the relevant bits highlighted:</p>
<blockquote>
<pre>
/* Objects/listobject.c */
...
static int
cmp(v, w)
const ANY *v, *w;
{
object *t, *res;
long i;
<font color="#0000ff"> PyThreadState *pts = PyThreadState_Get();</font>
if (err_occurred())
return 0;
if (<font color="#0000ff">pts->sort_comparefunc</font> == NULL)
return cmpobject(* (object **) v, * (object **) w);
/* Call the user-supplied comparison function */
t = mkvalue("(OO)", * (object **) v, * (object **) w);
if (t == NULL)
return 0;
res = call_object(<font color="#0000ff">pts->sort_comparefunc</font>, t);
DECREF(t);
...
}
</pre>
</blockquote>
<p>
Bits and pieces of the thread state code live on in Python3.2 today. In
particular, per-thread state is captured in a data similar structure and
there are functions for obtaining the state. In fact, there is even a
function called <tt>PyThreadState_Get()</tt>.</p>
<p>
<h3>Fine-Grained Locking of Reference Counting</h3>
</p>
<p>
Memory management of Python objects relies on reference counting. In
the C API, there are macros for manipulating reference counts:</p>
<blockquote>
<pre>
/* Include/objects.h */
...
#define Py_INCREF(op) (_Py_RefTotal++, (op)->ob_refcnt++)
#define Py_DECREF(op) \
if (--_Py_RefTotal, --(op)->ob_refcnt != 0) \
; \
else \
_Py_Dealloc(op)
...
</pre>
</blockquote>
<p>
These macros, along with their NULL-pointer safe variants <tt>Py_XINCREF</tt> and
<tt>Py_XDECREF</tt> are used throughout the Python source. A quick
search of the Python-1.4 source reveals about 250 uses. </p>
<p>
With free-threading, reference counting operations lose their
thread-safety. Thus, the patch introduces a global reference-counting mutex
lock along with atomic operations for updating the count. On Unix,
locking is implemented using a standard
<tt>pthread_mutex_t</tt> lock (wrapped inside a <tt>PyMutex</tt> structure) and the following functions:</p>
<blockquote>
<pre>
/* Python/pymutex.c */
...
PyMutex * _Py_RefMutex;
...
int _Py_SafeIncr(pint)
int *pint;
{
int result;
PyMutex_Lock(_Py_RefMutex);
result = ++*pint;
PyMutex_Unlock(_Py_RefMutex);
return result;
}
int _Py_SafeDecr(pint)
int *pint;
{
int result;
PyMutex_Lock(_Py_RefMutex);
result = --*pint;
PyMutex_Unlock(_Py_RefMutex);
return result;
}
</pre>
</blockquote>
<p>The <tt>Py_INCREF</tt> and <tt>Py_DECREF</tt> macros are then
redefined to use these thread-safe functions.</p>
<p>
On Windows, fine-grained locking is achieved by redefining
<tt>Py_INCREF</tt> and <tt>Py_DECREF</tt> to use
<tt>InterlockedIncrement</tt> and
<tt>InterlockedDecrement</tt> calls (see <a
href="http://msdn.microsoft.com/en-us/library/ms684122(v=vs.85).aspx">Interlocked
Variable Access</a> [MSDN]).</p>
<p>
On Unix, it must be emphasized that simple reference count
manipulation has been replaced by no fewer than three function calls,
plus the overhead of the actual locking. It's far more expensive.</p>
<p>
As a performance experiment, I decided to comment out the
<tt>PyMutex_Lock</tt> and <tt>PyMutex_Unlock</tt> calls and run the
interpreter in an unsafe mode. With this change, the performance of my
single-threaded 'spin-loop' dropped from 12.7 seconds to about 3.9 seconds. The threaded version dropped from 18.5 seconds to about 4 seconds. [ Note: corrected due to an unnoticed build-error when trying this experiment initially ].
</p>
<p> Clearly fine-grained locking of reference counts is the major
culprit behind the poor performance, but even if you take away the locking, the reference counting performance is still very sensitive to any kind of extra overhead (e.g., function call, etc.). In this case, the performance is still about twice as slow as Python with the GIL. </p>
<h3>Locking of Mutable Builtins</h3>
<p>
Mutable builtins such as lists and dicts need their own locking to
synchronize modifications. Thus, these objects grow an extra lock
attribute per instance. For example:</p>
<blockquote>
<pre>
/* Include/listobject.h */
...
typedef struct {
PyObject_VAR_HEAD
<font color="#0000ff"> Py_DECLARE_POOLED_LOCK</font>
PyObject **ob_item;
} PyListObject;
...
</pre>
</blockquote>
<p>
Virtually all underlying methods (append, insert, setitem, getitem,
repr, etc.) then use the per-instance lock under the covers.</p>
<p>
A interesting aspect of the implementation is the way that locks are
allocated. Instead of allocating a dedicated mutex lock for each
list or dictionary, the interpreter simply keeps a small pool of
available locks. When a list or dict first needs to be locked, a lock
is taken from the pool and used as long as it is needed (until the instance
is no longer being manipulated by any threads). At this point, the
lock is simply released back to the pool.</p>
<p>
This scheme greatly reduces the number of needed locks. Generally
speaking, the number of locks is about the same as the number of
active threads. Deeply nested data structures (e.g., lists
of lists of lists) may also increase the number of locks needed if certain
recursive operations are invoked. For example, printing a deeply
nested data structure might cause a lock to be allocated for each level
of nesting.</p>
<p>
Locking of mutable containers does not involve anything more
sophisticated than a mutex lock. For example, no attempt has been
made to utilize reader-writer locks.
</p>
<h3>Locking of Various Internal Operations</h3>
<p>
In various places thoughout the interpreter, there are low-level
book-keeping operations, often related to memory management and optimization.
Their implementation often relies upon the use of unsafe C static variables.
</p>
<p>
For this, the patch defines a mutex lock dedicated generally to
executing critical sections along with a pair of C macros for
locking.</p>
<blockquote>
<pre>
/* Include/pymutex.h */
...
#define Py_CRIT_LOCK() PyMutex_Lock(_Py_CritMutex)
#define Py_CRIT_UNLOCK() PyMutex_Unlock(_Py_CritMutex)
...
</pre>
</blockquote>
<p>
As an example of use, consider the <tt>int</tt> object. Due to the fact that
integers are used so frequently, the underlying C code uses a custom
memory allocator and other tricks to avoid excessive calls to C's
<tt>malloc</tt> and <tt>free</tt> functions. Here is an example of
the kind of thing you see in the patch (changes highlighted):</p>
<blockquote>
<pre>
/* Objects/intobject.c */
...
static intobject *<font color="#0000ff">volatile</font> free_list = NULL;
...
object *
newintobject(ival)
long ival;
{
...
<font color="#0000ff"> Py_CRIT_LOCK();</font>
if (free_list == NULL) {
if ((free_list = fill_free_list()) == NULL) {
<font color="#0000ff"> Py_CRIT_UNLOCK();</font>
return err_nomem();
}
}
v = free_list;
free_list = *(intobject **)free_list;
<font color="#0000ff"> Py_CRIT_UNLOCK();</font>
...
}
</pre>
</blockquote>
<p>
A quick analysis of the patch shows that there are about 20 such
critical sections that have to be protected. This includes low-level
code in the implementation of ints, tuples, dicts as well as code
generally related to the runtime of the interpreter (e.g., module
imports, signal handling, sys module, etc.).</p>
<p>
Although all of these sections share the same lock, the associated
overhead appears to be negligible compared to that of reference
counting.
</p>
<h3>Other Tricky Bits</h3>
<p>
Although the patch addresses some fundamental issues needed to make
free-threading work, it is only a small start. In particular, no
effort has been made to verify the thread-safety of any standard
library modules. This includes a large body of C code that would have
be audited in detail to identify and fix potential race
conditions.</p>
<p>
Certain parts of the Python implementation also remain problematic.
For example, certain low-level C API functions such as
<tt>PyList_GetItem()</tt> and <tt>PyDict_GetItem()</tt> return
borrowed references (e.g, objects without an increased reference
count). Although it seems remote, there is a possibility that such
functions could return a reference to an object that then gets destroyed by
another thread.
</p>
<h3>Final Words</h3>
<p>Looking at the patch has been an interesting trip through history,
but is there anything to learn from it? This is by no means an
exhaustive list, but a few thoughts come to mind:</p>
<ul>
<p>
<li>Reference counting is a really lousy memory-management technique
for free-threading. This was already widely known, but the performance
numbers put a more concrete figure on it. This will definitely be
the most challenging issue for anyone attempting a GIL removal patch.</li>
</p>
<P>
<li>For mutable types, you need per-instance locking. However,
through clever lock management, you can probably do it with a
relatively small number of locks (proportional to the number
of threads) as opposed to actually putting a dedicated lock on every
instance. The performance impact of such locking warrants further
study--especially given the heavy use of dictionaries throughout the
interpreter.
</li>
</p>
<p>
<li>Various internal parts of the interpreter will need locking, but
such locking doesn't appear to have as much of an impact as one might
expect (at least I wasn't able to measure a huge performance
hit due to it).</li>
</p>
<p>
<li>Python 3 already includes a few critical pieces needed to make
free-threading work. In particular, there are data structures that
capture per-thread state and functions for obtaining that state.
Because of that, if you were to eliminate the GIL, most of the
effort would tend to focus on locking as opposed to isolating
state.</li>
</p>
<p>
<li>Even though you might be able to patch the interpreter with a
small amount of code, verifying the thread safety of all standard library
modules (both Python and C code) will probably be a daunting
(and possibly never-ending) endeavor.</li>
</p>
<p> <li>Despite removing the GIL, I was unable to produce any
performance experiment that showed a noticeable improvement on
multiple cores. Really, the only benefit (ignoring the horrible
performance) seen in pure Python code, was having preemptible
instructions.</li> </p>
</ul>
<p> That's about it for now. I hope you found this trip through time
interesting. In a future installment, I'll explore the problem of
pushing locked reference counting as far as it can possibly go. As a
preview, a simple patch involving less than a dozen lines of code
makes the whole GIL-less Python run more than twice as fast. However,
can it go even faster? Stay tuned. </p>
Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-38745377590432649152011-05-27T14:44:00.000-07:002011-05-27T14:44:29.113-07:00Class decorators might also be super!<p>
Recently Raymond Hettinger posted an amazing article <a href="http://rhettinger.wordpress.com/2011/05/26/super-considered-super/">Python's super() considered super!"</a>. Even if you think you know what <tt>super()</tt> does, you should go read it.</p>
<p>
A commonly cited applications of <tt>super()</tt> is using it to implement a kind of cooperative inheritance as is sometimes found with mixin classes. Consider this code which is a slight variation of Raymond's example:</p>
<blockquote>
<pre>
class LoggedSetItemMixin:
def __setitem__(self,index,value):
logging.info('Setting %r to %r', index,value)
super().__setitem__(index,value)
</pre>
</blockquote>
<p>
Using this class, you could add logging to any class that implements <tt>__setitem__()</tt> by combining classes via multiple inheritance. For example:</p>
<blockquote>
<pre>
class LoggingDict(LoggedSetItemMixin,dict):
pass
class LoggingList(LoggedSetItemMixin,list):
pass
</pre>
</blockquote>
<p>
Here's some sample output:
</p>
<blockquote>
<pre>
>>> <b>d = LoggingDict()</b>
>>> <b>d['a'] = 1</b>
INFO:root:Setting 'a' to 1
>>> <b>e = LoggingList([0,1,2])</b>
>>> <b>e[0] = 99</b>
INFO:root:Setting 0 to 99
>>>
</pre>
</blockquote>
<p>
The whole reason that this works is that <tt>super()</tt> delegates to the next class on the MRO. Thus, the <tt>__setitem__()</tt> call in <tt>LoggedSetItemMixin</tt> actually steps over to the next class in MRO of whatever kind of instance is being used. If you find this amazing, consider the fact that <tt>LoggedSetItemMixin</tt> is using <tt>super()</tt> even though it doesn't even specify a base class! It's pretty cool--maybe even a slight bit diabolical.</p>
<p>
As amazing as this is, I've recently been thinking about a completely different approach to these kinds of problems based on class decorators. Consider this function:
</p>
<blockquote>
<pre>
def LoggedSetItem(cls):
orig_setitem = cls.__setitem__
def __setitem__(self, index, value):
logging.info('Setting %r to %r' ,index, value)
return orig_setitem(self,index,value)
cls.__setitem__ = __setitem__
return cls
</pre>
</blockquote>
<p>This function is meant to be used as a decorator to class definitions. For example:
</p>
<blockquote>
<pre>
@LoggedSetItem
class LoggingDict(dict):
pass
@LoggedSetItem
class LoggingList(list):
pass
</pre>
</blockquote>
<p>
Carefully study the implementation of <tt>LoggedSetItem</tt>. As input, it receives a class object. It then looks up the unbound <tt>__setitem__</tt> method and stores it in a variable. This lookup, as it turns out, is doing exactly the same work as <tt>super()</tt>. That is, it simply finds the implementation of the method being used by the class regardless of where it is actually located. After that, the function simply defines a replacement for <tt>__setitem__</tt> with added logging and attaches it back to the class object. References to the original implementation of <tt>__setitem__</tt> are held inside a closure so it all works out.</p>
<p>
The class decorator approach has several notable features. First, it doesn't even involve the use of <tt>super()</tt> (or multiple inheritance for that matter). Second, as with <tt>super()</tt>, you don't have to hard-code any classnames--the class is simply passed in as an argument. Third,it has very good runtime performance. This is because the work normally performed by <tt>super()</tt> is only performed once, at the time of class decoration. Finally, there is a kind of built-in error checking. For example, if you try to apply the decorator to a class that doesn't support the required method, you will immediately get an error:</p>
<blockquote>
<pre>
>>> <b>@LoggedSetItem
class loggedint(int): pass</b>
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "logsetitem.py", line 9, in LoggedSetItem
orig_setitem = cls.__setitem__
AttributeError: type object 'loggedint' has no attribute '__setitem__'
>>>
</pre>
</blockquote>
<p>
As interesting as this is, I have no idea if using class decorators in this manner would be considered to be good practice or not. One potential problem is that by putting the code in a decorator, a lot of the work is performed just once at the time of class definition. If a program was playing sneaky tricks like dynamically changing method definitions at runtime, it clearly wouldn't work. There's also a certain risk that this approach is just too clever for it's own good.</p>
<p>
Do you see any other downsides? I'd love to get your feedback.
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-56229240486852422262011-04-27T12:45:00.000-07:002011-04-28T05:05:01.188-07:00Practical Python with Raymond Hettinger<p>Raymond Hettinger is coming to Chicago May 16-20 to put his unique spin on my <a href="http://www.dabeaz.com/chicago/practical.html">Practical Python Programming</a> course. Although that is coming up soon, there is still time to register and a few slots are still available. Needless to say, if you've been looking for a class where you can learn more about Python and improve your skills, you won't find a better class anywhere!</p>
<p>Raymond Hettinger is the same core developer whose name can be found on no fewer than 13 <a href="http://www.python.org/dev/peps/">PEPs</a> including a variety of very useful features of modern Python programming. For example, the <a href="http://www.python.org/dev/peps/pep-0279/"><tt>enumerate()</tt></a> function that lets you keep track of where you are in iteration such as this example that gives you a line number when reading a file:</p>
<blockquote>
<pre>
>>> f = open("data.dat")
>>> for lineno, line in enumerate(f,1):
...
</pre>
</blockquote>
<p>
Or maybe you like reversing things with the <a href="http://www.python.org/dev/peps/pep-0322/"><tt>reversed()</tt></a> function:</p>
<blockquote>
<pre>
>>> for x in reversed(seq):
...
>>>
</pre>
</blockquote>
<p>
Or what about putting a <a href="http://www.python.org/dev/peps/pep-0378/">thousands separator</a> on numbers?</p>
<blockquote>
<pre>
>>> x = 123456789
>>> format(x,",")
'123,456,789'
>>>
</pre>
</blockquote>
<p>
Or <a href="http://www.python.org/dev/peps/pep-0218/">sets</a>?</p>
<blockquote>
<pre>
>>> a = set(['a','b','c'])
>>> b = set(['c','d','e'])
>>> a & b
set(['c'])
>>> a | b
set(['a', 'b', 'c', 'd', 'e'])
>>>
</pre>
</blockquote>
<p>
All of these features contain some of Raymond's handiwork. However, that's really only scratching the surface. Maybe you've used various features in the <tt>collections</tt> or <a href="http://svn.python.org/view/python/trunk/Modules/itertoolsmodule.c?view=markup"><tt>itertools</tt></a> modules. Or maybe you've used <a href="http://www.python.org/dev/peps/pep-0289/">generator expressions</a>, one of my favorite Python features. Again, Raymond's work.</p>
<p>
Last, but not least, Raymond is a well-known speaker and presenter. I distinctly remember seeing him give one of the most amazing presentations at PyCon UK in 2008 about the inner secrets of Python containers--a talk that left me thinking "I had no idea Python worked like that." At PyCon'2011 Raymond gave a well-received talk about <a href="http://blip.tv/file/4883290">API Design</a>. <b>Update:</b> Raymond is giving no fewer than 6 talks at <a href="http://ep2011.europython.eu/conference/speakers/raymond-hettinger">EuroPython</a> including an <a href="http://ep2011.europython.eu/conference/talks/what-makes-python-so-awesome">invited keynote talk</a>. </p>
<p>
So, if you're thinking about learning more about Python, you could certainly read an online tutorial, watch a video, or take a class where an instructor shows up. Or, you can join five other developers for an in-depth class created by the author who wrote one of the most <a href="http://www.amazon.com/Python-Essential-Reference-David-Beazley/dp/0672329786">in-depth Python books</a> and presented by a core developer who knows Python inside-out. Needless to say, you won't be disappointed.</p>
<p>
As a bonus, if you stick around for Friday afternoon, you can have your head completely exploded by signing up for my <a href="http://www.dabeaz.com/chicago/hardpython.html">Learn Hard Python</a> seminar--a 3 hour tour through some of Python's most advanced features including descriptors, super(), function objects, closures, decorators, context managers, and metaclasses.</p>
<p>
Hopefully you'll join Raymond and myself for a great week of Python. More information is available at <a href="http://www.dabeaz.com/chicago/index.html">http://www.dabeaz.com/chicago/index.html</a>.</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-30034612309426272312011-04-04T07:23:00.001-07:002011-04-24T13:18:05.973-07:00Learn Python from Raymond Hettinger in Chicago<p>
In one of the hallway tracks at Pycon, <a href="http://rhettinger.wordpress.com/">Raymond Hettinger</a> came up to me and said "I've been thinking about teaching a Python class." Needless to say, I couldn't pass on that kind of opportunity. So, I'm pleased to announce that Raymond is coming to Chicago, May 16-20 to put his unique spin on my <a href="http://www.dabeaz.com/chicago/practical.html">Practical Python Programming</a> course along with an assortment of his own material. The course is being held in my Python lair so I'll stop by to say "hi" before leaving you in the hands of one of Python's foremost experts. In short, this might be one of the most fantastic Python courses ever offered--and as with past courses, space is limited to just six students.</p>
<p>
In case you're not so familiar with Raymond's work, let's just say that it's hard to escape it if you've done any kind of Python programming at all. Not only is Raymond a Python core developer responsible for numerous features such as collections, itertools, sets, generator expressions, and the peephole optimizer, he is a well-known Pycon speaker and board member of the Python Software Foundation. In short, if you take this class, you'll not only learn about features of the Python language, you'll be learning from the person who contributed many of them in the first place.</p>
<p>
I should emphasize that this class is really designed for new Python programmers who want to get off to a great start. As long as you know about general programming concepts, no prior Python experience is required. Of course, even if you know some Python, you are still going to learn a wide variety of new and interesting things.</p>
<p>
More information about this and other courses is available <a href="http://www.dabeaz.com/chicago/index.html">here</a>. Hopefully you'll join Raymond in May!</p>
<p><b>Update (April 24, 2011)</b> There is just one slot left! What are you waiting for?</p>
<p>
-- Dave
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-28635077891802181032011-03-15T13:59:00.001-07:002011-03-15T13:59:04.165-07:00The Superboard Takes Pycon!<p>
Well, the Superboard and I are back in Chicago after surviving PyCon. What a great conference--it's always exciting to see 1400 enthusiastic Python programmers in one place!</p>
<center>
<img src="http://www.dabeaz.com/images/osi_small.jpg"></img>
</center>
<p>In case you missed it, you can now watch the video of my <a href="http://pycon.blip.tv/file/4878868/">Building a Cloud Computing Service for my Superboard II</a> presentation. In this post, I just briefly wanted to fill in more details about the talk, including some links to prior blog posts, ported libraries, code, etc.</p>
<p>
First, as background, you might check out some of my earlier blog posts that describe audio encoding/decoding as well as the problem of building an emulated version of the Superboard using Py65. Here are some links:</p>
<p>
<ul>
<li>19 Jan 2011. <a href="http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html">Porting Py65 (and my Superboard) to Python 3</a></li>
<li>04 Sep 2010. <a href="http://dabeaz.blogspot.com/2010/09/using-telnet-to-access-my-superboard-ii.html">Using telnet to access my Superboard II (via Python and cassette ports)</a></li>
<li>29 Aug 2010. <a href="http://dabeaz.blogspot.com/2010/08/decoding-superboard-ii-cassette-audio.html">Decoding Superboard II Cassette Audio Using Python 3, Two Generators, and a Deque</a></li>
<li>22 Aug 2010. <a href="http://dabeaz.blogspot.com/2010/08/using-python-to-encode-cassette.html">Using Python to Encode Cassette Recordings for my Superboard II</a></li>
</ul>
<p>
An <a href="http://blip.tv/file/4639616/">earlier talk</a> about the Superboard was given at the January, 2011 Chipy meeting. This talk was quite a bit different than the Pycon presentation and focused more on the problem of building an emulated Superboard. It also includes some live demos and more general history about the Superboard.</p>
<p>
In the Pycon talk, I described how I built a 6502 assembler from scratch. At one point, I was planning on writing a separate blog post about that, but for now, you can just look at the raw code <a href="http://www.dabeaz.com/superboard/asm6502.py">here</a>. Related to that, you can also see the assembly code for the Superboard <a href="http://www.dabeaz.com/superboard/msgdrv.asm">messaging driver</a>.</p>
<p>
<a href="http://zeromq.org">ZeroMQ</a> played a big role in the project--specifically, I used it to build all sorts of client applications on the Macintosh. The starting point for that code was a program <a href="http://www.dabeaz.com/superboard/aciamsg.py">aciamsg.py</a> that implemented the binary messaging link to the Superboard and bridged it to clients via a set of ZeroMQ sockets. Client services were supported by a class defined in <a href="http://www.dabeaz.com/superboard/msgservice.py">msgservice.py</a>. For example, <a href="http://www.dabeaz.com/superboard/divmod.py">divmod.py</a> computes the divmod of two variables and <a href="http://www.dabeaz.com/superboard/fibo.py">fibo.py</a> computes fibonacci numbers.</p>
<p>
An emulated Superboard was created using Py65. An earlier <a href="http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html">blog post</a> describes that project, but the version I used for my Pycon talk is in a file <a href="http://www.dabeaz.com/superboard/superboard2.py">superboard2.py</a>. Essentially, it emulates a superboard in a VT100 compatible terminal window. Operations on the video ram are translated into VT100 compatible terminal commands. You might be shocked at the size of the emulator--it's only around 220 lines.</p>
<p>
For the cloud service, a special <a href="http://www.dabeaz.com/superboard/supercloud.py">supercloud.py</a> service is used to listen for USR(0) requests. This service feeds work into a queue that is processed by a <a href="http://www.dabeaz.com/superboard/superun.py">superrun.py</a> program which runs emulated the Superboards in the background. The code is actually written in a way that allows for different implementations of the job queue and program store. In the talk, I described the use of <a href="http://redis.io">Redis</a>, but that's not the only option.</p>
<p>
There are a few other bits of code not shown, but the above fragments should give you enough of a general idea how things were put together. I have to admit that some of the code was rather hastily written so don't expect too much from it.</p>
<p>
<b>When did you find time to do this?</b>
</p>
<p>
This entire Superboard project was nothing more than an interesting hobby project. Most of the really hard work including the audio encoding/decoding, 6502 assembler, and messaging device driver were coded over a couple of weekends in late August 2010. For a few months after that, I messed around with different possible designs of a "cloud service", but since I also had other work to do, progress was spread out and sporadic. Initially, I thought the service was going to implement a kind of remote "gosub" service (i.e., BASIC programs on the Superboard could remotely GOSUB to code living elsewhere and that remote code would be able to see the BASIC workspace via shared memory), but that never really panned out. In January, 2011 I was fooling around with Py65 and created the first emulated Superboard. That work resulted in the final design of the system presented at Pycon (namely, having a cloud of emulated Superboard instances). I have to admit that I liked this design much better than my original GOSUB idea.</p>
<p>
<b>Ported Python3 libraries</b>
</p>
<p>
You can find some of the libraries I ported to Python 3 on <a href="http://github.com/dabeaz">Github</a>. Some of the other libraries are still just sitting on my machine. Eventually I'm hoping to have everything published online on my Github account as time allows.
</p>
<p>
<b>The Superboard's Favorite Pycon Talks</b>
</p>
<p>
I wanted to mention a few really outstanding Pycon talks that I attended. First, check out Richard Saunder's <a href="http://pycon.blip.tv/file/4882867/">Everything You Wanted to Know About Pickling, But Were Afraid To Ask</a>. I have to admit that to me, pickling is almost more mysterious than the GIL. Richard did a great job peeling back the covers. I also really enjoyed Van Lindberg's <a href="http://pycon.blip.tv/file/4879824/">How to Kill a Patent with Python</a>. Van Lindberg is one diabolical lawyer indeed.</p>
<p>
As always, I enjoyed meeting everyone at Pycon. If you ever want to meet the Superboard in person, you should come to one of my <a href="http://www.dabeaz.com/chicago/index.html">Python courses</a> in Chicago.</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-72492395569486126952011-02-04T11:12:00.000-08:002011-02-09T00:46:05.542-08:00Does Anyone In Australia Want a Free Python3 PyCon Tutorial?<p><b>Update : Feb 9, 2011:</b> The tutorial is a go in Melbourne for Saturday, February 12, 2011 at 2pm! Contact Steven.cyphers@gmail.com to RSVP.</p>
<p>
Well, the title of this post just about says it all. I'm heading down under to do some Python training in Canberra, but I have a free weekend February 12-13, 2011. So, I'm wondering if anyone might have an interest in attending a free preview of my PyCon'2011 tutorial on <a href="http://us.pycon.org/2011/schedule/sessions/122/">Mastering Python 3 I/O</a>. Here are the ground rules:</p>
<ul>
<li>You provide the space, supply a video projector, and deal with any logistics concerning the location.</li>
<li>You tell me where it is.</li>
<li>I show up.</li>
<li>We have a great time talking about Python 3 for half a day.</li>
<li>Beers to follow.</li>
</ul>
<p>Although I'm staying in Canberra, I can travel anywhere nearby that is easy to get to by plane including Sydney and Melbourne (in fact, travel is preferable since I also want to play tourist). Send me an <a href="mailto:dave@dabeaz.com">email</a> if you're interested.</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-55236992691870395202011-01-19T15:38:00.000-08:002011-01-19T15:38:36.797-08:00Porting Py65 (and my Superboard) to Python 3<p>
One of my resolutions for 2011 is to write all of my software in
Python 3. As a hardened Python 2 programmer, I think my initial reaction
to Python 3 was lukewarm at best--it felt foreign and it made life
painful in ways that I found irritating (looking at you Unicode). However, as I have used it
more (and it has improved), I've really grown to like it. Most
recently, I used Python 3 as the base language for my <a
href="http://www.dabeaz.com/chicago/concurrent.html">Concurrency
Workshop</a>. I have also been using it as the language for my
various diabolical <a
href="http://dabeaz.blogspot.com/2010/08/using-python-to-encode-cassette.html">Superboard
II</a> projects. Last, but not least, I find myself as one of the
editors working to update the O'Reilly Python Cookbook--which is going
to be <a href="http://dabeaz.blogspot.com/2010/12/oreilly-python-cookbook-python-3-all.html">Python 3 only</a>.
</p>
<p>
If you're going to use Python 3, the first thing to know is that not
all libraries are going to work--not everyone has gotten around to
porting their code. This means that you have to adopt a more
"pioneering" mindset. In my case, I've simply decided to port the
libraries that I wanted to use as I go. From a purely academic
viewpoint, taking someone else's code and porting it to Python 3 is an
interesting exercise. Not only will you learn a lot simply by reading
someone else's code, you'll learn about all sorts of sneaky little gotchas
that aren't necessarily discussed in the Python 3 porting guides.
</p>
<p>
Over the next few months, I intend to make a series of blog posts
about my experiences porting different libraries. In this
installment, I port Py65, a Python emulation of the 6502.
</p>
<p>
<b>Py65 - A 6502 Emulator in Python</b>
</p>
<p> <a href="https://github.com/mnaberez/py65">Py65</a> is a pure
Python emulation of the 6502 microprocessor created by Mike Naberezny.
I don't really know what motivated Mike to create an emulated 6502 in
Python, but I became interested in Py65 because I suddenly had the
idea that I might be able to use to create an emulated version of my
old <a
href="http://dabeaz.blogspot.com/2010/08/using-python-to-encode-cassette.html">Superboard
II</a> entirely as a Python 3 program. Why, you ask? Because it
would be fun. Now, stop asking silly questions--the Superboard is
getting annoyed.</p>
<p>
<b>Py65 - A Quick Overview</b>
</p>
<p>
One of the main features of Py65 is a 6502 machine monitor
where you can load/save memory, step through programs, and try things
out. For example, if you had an old 6502 ROM image sitting around,
you can load it, disassemble it, and step through parts of it like this:
</p>
<blockquote>
<pre>
bash % <b>py65mon</b>
Py65 Monitor
PC AC XR YR SP NV-BDIZC
6502: 0000 00 00 00 ff 00110000
<b>.load rom.bin f800</b>
Wrote +2048 bytes from $f800 to $ffff
PC AC XR YR SP NV-BDIZC
6502: 0000 00 00 00 ff 00110000
<b>.disassemble ff00:ff20</b>
$ff00 d8 CLD
$ff01 a2 28 LDX #$28
$ff03 9a TXS
$ff04 a0 0a LDY #$0a
$ff06 b9 ef fe LDA $feef,Y
$ff09 99 17 02 STA $0217,Y
$ff0c 88 DEY
$ff0d d0 f7 BNE $ff06
$ff0f 20 a6 fc JSR $fca6
$ff12 8c 12 02 STY $0212
$ff15 8c 03 02 STY $0203
$ff18 8c 05 02 STY $0205
$ff1b 8c 06 02 STY $0206
$ff1e ad e0 ff LDA $ffe0
PC AC XR YR SP NV-BDIZC
6502: 0000 00 00 00 ff 00110000
<b>.registers pc=ff00</b>
PC AC XR YR SP NV-BDIZC
6502: ff00 00 00 00 ff 00110000
<b>.step</b>
$ff01 a2 28 LDX #$28
PC AC XR YR SP NV-BDIZC
6502: ff01 00 00 00 ff 00110000
<b>.step</b>
$ff03 9a TXS
PC AC XR YR SP NV-BDIZC
6502: ff03 00 28 00 ff 00110000
...
</pre>
</blockquote>
<p>
Of course, there are many other features described in the
<a href="http://6502.org/users/mike/projects/py65/index.html">Py65 Documentation</a>.
<p>
<b>Porting Py65 to Python 3</b>
</p>
<p>
Py65 consists of 27 <tt>.py</tt> files and about 12000 lines of code.
More than half of the code consists of unit tests.</p>
<p>
To start porting, I decided that I would just run all of the files
through <tt>2to3</tt> to get a basic sense for what I might have to
change at a syntactic level. Here is the complete output of doing that. In a nutshell,
36 lines were identified. Most of the changes were due to well-known
Python 3 changes such as changed exception handling syntax,
<tt>xrange()</tt> and so forth.</p>
<blockquote>
<pre>
bash % <b>2to3 src</b>
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
--- src/py65/monitor.py (original)
+++ src/py65/monitor.py (refactored)
@@ -32,7 +32,7 @@
result = cmd.Cmd.onecmd(self, line)
except KeyboardInterrupt:
self._output("Interrupt")
- except Exception,e:
+ except Exception as e:
(file, fun, line), t, v, tbinfo = compact_traceback()
error = 'Error: %s, %s: file: %s line: %s' % (t, v, file, line)
self._output(error)
@@ -85,7 +85,7 @@
line = self._shortcuts['~'] + ' ' + line[1:]
# command shortcuts
- for shortcut, command in self._shortcuts.iteritems():
+ for shortcut, command in self._shortcuts.items():
if line == shortcut:
line = command
break
@@ -150,7 +150,7 @@
mpus = {'6502': NMOS6502, '65C02': CMOS65C02}
def available_mpus():
- mpu_list = ', '.join(mpus.keys())
+ mpu_list = ', '.join(list(mpus.keys()))
self._output("Available MPUs: %s" % mpu_list)
if args == '':
@@ -315,14 +315,14 @@
if args != '':
new = args[0].lower()
changed = False
- for name, radix in radixes.iteritems():
+ for name, radix in radixes.items():
if name[0].lower() == new:
self._address_parser.radix = radix
changed = True
if not changed:
self._output("Illegal radix: %s" % args)
- for name, radix in radixes.iteritems():
+ for name, radix in radixes.items():
if self._address_parser.radix == radix:
self._output("Default radix is %s" % name)
@@ -364,7 +364,7 @@
if len(register) == 1:
intval &= 0xFF
setattr(self._mpu, register, intval)
- except KeyError, why:
+ except KeyError as why:
self._output(why[0])
def help_cd(self, args):
@@ -374,7 +374,7 @@
def do_cd(self, args):
try:
os.chdir(args)
- except OSError, why:
+ except OSError as why:
msg = "Cannot change directory: [%d] %s" % (why[0], why[1])
self._output(msg)
self.do_pwd()
@@ -407,12 +407,12 @@
f = open(filename, 'rb')
bytes = f.read()
f.close()
- except (OSError, IOError), why:
+ except (OSError, IOError) as why:
msg = "Cannot load file: [%d] %s" % (why[0], why[1])
self._output(msg)
return
- self._fill(start, start, map(ord, bytes))
+ self._fill(start, start, list(map(ord, bytes)))
def do_save(self, args):
split = shlex.split(args)
@@ -430,7 +430,7 @@
for byte in bytes:
f.write(chr(byte))
f.close()
- except (OSError, IOError), why:
+ except (OSError, IOError) as why:
msg = "Cannot save file: [%d] %s" % (why[0], why[1])
self._output(msg)
return
@@ -455,7 +455,7 @@
return
start, end = self._address_parser.range(split[0])
- filler = map(self._address_parser.number, split[1:])
+ filler = list(map(self._address_parser.number, split[1:]))
self._fill(start, end, filler)
@@ -518,10 +518,10 @@
self._output("Display current label mappings.")
def do_show_labels(self, args):
- values = self._address_parser.labels.values()
- keys = self._address_parser.labels.keys()
+ values = list(self._address_parser.labels.values())
+ keys = list(self._address_parser.labels.keys())
- byaddress = zip(values, keys)
+ byaddress = list(zip(values, keys))
byaddress.sort()
for address, label in byaddress:
self._output("%04x: %s" % (address, label))
--- src/py65/tests/test_memory.py (original)
+++ src/py65/tests/test_memory.py (refactored)
@@ -56,7 +56,7 @@
def read_subscriber(address, value):
return 0xAB
- mem.subscribe_to_read(xrange(0xC000, 0xC001+1), read_subscriber)
+ mem.subscribe_to_read(range(0xC000, 0xC001+1), read_subscriber)
mem[0xC000] = 0xAB
mem[0xC001] = 0xAB
@@ -141,7 +141,7 @@
return 0xFF
mem.subscribe_to_write([0xC000,0xC001], write_subscriber)
- mem.write(0xC000, [0x01, 002])
+ mem.write(0xC000, [0x01, 0o02])
self.assertEqual(0x01, subject[0xC000])
self.assertEqual(0x02, subject[0xC001])
--- src/py65/tests/test_monitor.py (original)
+++ src/py65/tests/test_monitor.py (refactored)
@@ -4,7 +4,7 @@
import os
import tempfile
from py65.monitor import Monitor
-from StringIO import StringIO
+from io import StringIO
class MonitorTests(unittest.TestCase):
@@ -168,7 +168,7 @@
mon = Monitor(stdout=stdout)
mon._address_parser.labels['foo'] = 0xc000
mon.do_delete_label('foo')
- self.assertFalse(mon._address_parser.labels.has_key('foo'))
+ self.assertFalse('foo' in mon._address_parser.labels)
out = stdout.getvalue()
self.assertEqual('', out)
--- src/py65/tests/devices/test_mpu6502.py (original)
+++ src/py65/tests/devices/test_mpu6502.py (refactored)
@@ -4979,8 +4979,7 @@
self.assertEquals(0x0001, mpu.pc)
def test_decorated_addressing_modes_are_valid(self):
- valid_modes = map(lambda x: x[0],
- py65.assembler.Assembler.Addressing)
+ valid_modes = [x[0] for x in py65.assembler.Assembler.Addressing]
mpu = self._make_mpu()
for name, mode in mpu.disassemble:
self.assert_(mode in valid_modes)
@@ -5024,12 +5023,12 @@
def _make_mpu(self, *args, **kargs):
klass = self._get_target_class()
mpu = klass(*args, **kargs)
- if not kargs.has_key('memory'):
+ if 'memory' not in kargs:
mpu.memory = 0x10000 * [0xAA]
return mpu
def _get_target_class(self):
- raise NotImplementedError, "Target class not specified"
+ raise NotImplementedError("Target class not specified")
class MPUTests(unittest.TestCase, Common6502Tests):
--- src/py65/tests/utils/test_addressing.py (original)
+++ src/py65/tests/utils/test_addressing.py (refactored)
@@ -48,7 +48,7 @@
try:
parser.number('bad_label')
self.fail()
- except KeyError, why:
+ except KeyError as why:
self.assertEqual('Label not found: bad_label', why[0])
def test_number_label_hex_offset(self):
@@ -94,7 +94,7 @@
try:
parser.number('bad_label+3')
self.fail()
- except KeyError, why:
+ except KeyError as why:
self.assertEqual('Label not found: bad_label', why[0])
def test_number_truncates_address_at_maxwidth_16(self):
--- src/py65/tests/utils/test_hexdump.py (original)
+++ src/py65/tests/utils/test_hexdump.py (refactored)
@@ -27,7 +27,7 @@
try:
Loader(text)
self.fail()
- except ValueError, why:
+ except ValueError as why:
msg = 'Start address was not found in data'
self.assert_(why[0].startswith('Start address'))
@@ -36,7 +36,7 @@
try:
Loader(text)
self.fail()
- except ValueError, why:
+ except ValueError as why:
msg = 'Could not parse address: oops'
self.assertEqual(msg, why[0])
@@ -45,7 +45,7 @@
try:
Loader(text)
self.fail()
- except ValueError, why:
+ except ValueError as why:
msg = 'Expected address to be 2 bytes, got 1'
self.assertEqual(msg, why[0])
@@ -54,7 +54,7 @@
try:
Loader(text)
self.fail()
- except ValueError, why:
+ except ValueError as why:
msg = 'Expected address to be 2 bytes, got 3'
self.assertEqual(msg, why[0])
@@ -63,7 +63,7 @@
try:
Loader(text)
self.fail()
- except ValueError, why:
+ except ValueError as why:
msg = 'Non-contigous block detected. Expected next ' \
'address to be $c001, label was $c002'
self.assertEqual(msg, why[0])
@@ -73,7 +73,7 @@
try:
Loader(text)
self.fail()
- except ValueError, why:
+ except ValueError as why:
msg = 'Could not parse data: foo'
self.assertEqual(msg, why[0])
--- src/py65/utils/addressing.py (original)
+++ src/py65/utils/addressing.py (refactored)
@@ -26,7 +26,7 @@
def label_for(self, address, default=None):
"""Given an address, return the corresponding label or a default.
"""
- for label, label_address in self.labels.iteritems():
+ for label, label_address in self.labels.items():
msg = "Expected address to be 2 bytes, got %d" % (
len(addr_bytes))
- raise ValueError, msg
+ raise ValueError(msg)
address = (addr_bytes[0] << 8) + addr_bytes[1]
@@ -62,19 +62,19 @@
msg = "Non-contigous block detected. Expected next address " \
"to be $%04x, label was $%04x" % (self.current_address,
address)
- raise ValueError, msg
+ raise ValueError(msg)
def _parse_bytes(self, piece):
if self.start_address is None:
msg = "Start address was not found in data"
- raise ValueError, msg
+ raise ValueError(msg)
else:
try:
bytes = [ ord(c) for c in a2b_hex(piece) ]
except (TypeError, ValueError):
msg = "Could not parse data: %s" % piece
- raise ValueError, msg
+ raise ValueError(msg)
self.current_address += len(bytes)
self.data.extend(bytes)
RefactoringTool: Files that need to be modified:
RefactoringTool: src/py65/monitor.py
RefactoringTool: src/py65/tests/test_memory.py
RefactoringTool: src/py65/tests/test_monitor.py
RefactoringTool: src/py65/tests/devices/test_mpu6502.py
RefactoringTool: src/py65/tests/utils/test_addressing.py
RefactoringTool: src/py65/tests/utils/test_hexdump.py
RefactoringTool: src/py65/utils/addressing.py
RefactoringTool: src/py65/utils/hexdump.py
</pre>
</blockquote>
<p>Not seeing anything too critical, I decided to invoke <tt>2to3 -w</tt> to
simply patch all of the code. However, I must emphasize--using
<tt>2to3</tt> is almost never enough to make a Python 3 port. In
the next few parts, I discuss a few tricky porting problems
encountered in making the new library work. This is by no means an
exhaustive list.
</p>
<p>
<b>Python 3 Porting Issue : Exception Indexing</b>
</p>
<p>
In several places, Py65 performs an indexed lookup on exception values.
For example, consider this fragment:</p>
<blockquote>
<pre>
try:
f = open("somebadfile")
except IOError as why:
msg = "Cannot open file: [%d] %s" % (why[0], why[1])
print(msg)
</pre>
</blockquote>
<p>
If you try this code in Python 2, it works. However, if you try it in
Python 3, you will get an <tt>TypeError</tt> crash like this:
</p>
<blockquote>
<pre>
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IOError: [Errno 2] No such file or directory: 'badfile'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
TypeError: 'IOError' object is not subscriptable
</pre>
</blockquote>
<p>Under the covers, exceptions hold their value in an
<tt>args</tt> tuple. In Python 2, operations such as <tt>why[0]</tt>
and <tt>why[1]</tt> would simply return <tt>why.args[0]</tt> and
<tt>why.args[1]</tt>. This no longer works in Python 3 so you can't
rely on it. A better fix is to either refer to <tt>args</tt> directly
or to use the documented exception attributes. For example: </p>
<blockquote>
<pre>
try:
f = open("somebadfile")
except IOError as why:
msg = "Cannot open file: [%d] %s" % (why.errno, why.strerror)
print(msg)
</pre>
</blockquote>
<p>
In Py65, I identified 12 lines where exceptions are indexed in this
manner. Most of those changes were in unit tests that checked for specific
exception messages and error codes.
</p>
<p> While we're on the subject of exceptions, it's also worth noting
that the scope of the <tt>why</tt> variable in the above example is
different in Python 3. Specifically, exception variables are only
defined for code inside the <tt>except</tt> block. In Python 2, such
variables persists after the <tt>try-except</tt> statement.</p>
<p>
<b>Python 3 Porting Issue : Overloaded Slicing</b>
</p>
<p>
One of the objects defined by Py65 is an observable memory buffer.
The precise implementation is not so important, but it's programmed to
be a list-like object that supports both indexing and slicing, but
with the ability to invoke registered observer functions on
user-specified indices (see the project at the end of the post for an example).
</p>
<p>
In Python 2, you could use different methods for indexing and slicing by
implementing <tt>__getitem__()</tt> and <tt>__getslice__()</tt> like this:
</p>
<blockquote>
<pre>
class ListLike:
def __getitem__(self,n):
print("getitem",n)
def __getslice__(self,start,stop=None,step=None):
print("getslice", start,stop,step)
</pre>
</blockquote>
<p> The only problem is that in Python 3, <tt>__getslice__()</tt> no
longer exists as a special method (in fact, it's deprecated in Python 2
as well, but is still supported for backwards compatibility). So, if
you try the following example, you'll see <tt>__getitem__()</tt> being
called for both indexing and slicing. Here is what happens: </p>
<blockquote>
<pre>
>>> <b>s = ListLike()</b>
>>> <b>s[2]</b>
getitem 2
>>> <b>s[2:4]</b>
getitem slice(2, 4, None)
>>>
</pre>
</blockquote>
<p>
Unless you've programmed <tt>__getitem__()</tt> specifically to look
for <tt>slice</tt> objects, you will run into trouble. For example,
when trying Py65, I started getting all sorts of errors about
incorrect use of <tt>slice</tt> objects. However,
here's a little bit of code that solves that problem:
</p>
<blockquote>
<pre>
class ListLike:
def __getitem__(self,n):
if isinstance(n,slice):
return [self[i] for i in range(*n.indices(len(self)))]
# Return item n
...
</pre>
</blockquote>
<p>
Or, if you're a little more sneaky, you might use <tt>itertools</tt>:
</p>
<blockquote>
<pre>
class ListLike:
def __getitem__(self,n):
if isinstance(n,slice):
return list(itertools.islice(self,*n.indices(len(self))))
# Return item n
...
</pre>
</blockquote>
<p>
For slices, the value passed to <tt>__getitem__()</tt> will be a
<tt>slice</tt> object. You can create these yourself.</p>
<blockquote>
<pre>
>>> <b>n = slice(2,4)</b>
>>> <b>n</b>
slice(2, 4, None)
>>>
</pre>
</blockquote>
<p>
The <tt>indices(size)</tt> method of a slice returns a tuple <tt>(start,
stop, step)</tt> that you can use should you decide to iterate over
the slice using <tt>range()</tt> or some other function. For example:</p>
<blockquote>
<pre>
>>> <b>n.indices(100)</b>
(2, 4, 1)
>>>
</pre>
</blockquote>
<p>
You can use this result as input to <tt>range()</tt> to generate the
needed sequence of indices associated with the slice.</p>
<p>
<b>Python 3 Porting Issue: Treating bytes as character arrays</b>
</p>
<p>
If you perform any kind of binary I/O in Python 3, be aware that data
will be read as <tt>bytes</tt> objects and that those objects do not
have the same behavior as strings.</p>
<p>
Consider this code fragment from Py65, in particular, the parts
highlighted in <font color="#ff0000">red</font>.
<blockquote>
<pre>
try:
f = open(filename, 'rb')
<font color="#ff0000"> bytes = f.read()</font>
f.close()
except (OSError, IOError) as why:
msg = "Cannot load file: [%d] %s" % (why[0], why[1])
self._output(msg)
return
self._fill(start, start, <font color="#ff0000">list(map(ord, bytes))</font>)
</pre>
</blockquote>
<p>
First complaint--don't use <tt>bytes</tt> as the name of a variable.
<tt>bytes</tt> is now the name of a built-in type. However, that's
not the problem here. Instead, the problem is with the <tt>map()</tt>
operation at the end. Here is what happens in Python 2:</p>
<blockquote>
<pre>
>>> <b>s = "Hello"</b>
>>> <b>list(map(ord,s))</b>
[72, 101, 108, 108, 111]
>>>
</pre>
</blockquote>
<p>
If you try it in Python 3, you get an error:
</p>
<blockquote>
<pre>
>>> <b>s = b"Hello"</b>
>>> <b>list(map(ord,s))</b>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ord() expected string of length 1, but int found
>>>
</pre>
</blockquote>
<p>
What's happening here? Well, the answer is simple--<tt>bytes</tt>
objects in Python 3 are already treated as arrays as integers so the
extra conversion using <tt>ord()</tt> isn't needed. For
example:
</p>
<blockquote>
<pre>
>>> <b>s = b"Hello"</b>
>>> <b>s[0]</b>
72
>>> <b>s[1]</b>
101
>>> <b>s[2]</b>
108
>>>
</pre>
</blockquote>
<p> In the case of the above example, you can replace
<tt>list(map(ord,bytes))</tt> with <tt>list(bytes)</tt> or maybe even
just <tt>bytes</tt> as it is already considered to be an array of
integer values.</p>
<p>
<b>Porting Summary</b>
</p>
<p>
All told, I don't think I spent more than about an hour porting Py65
so that I could use it with Python 3. As part of this work, I must
emphasize that I ported all of the supplied unit tests and also ran them
under Python 3 until all reported test failures were resolved.
Although I can't claim that it is bug-free, it was good enough to do
the project described next.
</p>
<p>
<b>Py65 Project: Creating an Emulated Superboard II</b>
</p>
<p>
In previous blog posts, I've described a couple of projects involving
my old Superboard II system--my first computer. Here is a picture of
it.
</p>
<p>
<center>
<img src="http://www.dabeaz.com/images/osi_small.jpg"/>
</center>
</p>
<p>
To make an emulator, you need to know details about the underlying
hardware including memory map, ROMs, and hardware devices. For this,
I referred to the Superboard II memory map taken straight from its user
manual. Here it is:
</p>
<p>
<center>
<img src="http://www.dabeaz.com/images/sb_mmap.png"/>
</center>
</p>
<p>
To capture the ROM images, I wrote two simple BASIC program to dump the
ROM data out of the cassette port. For example, like this:</p>
</p>
<blockquote>
<pre>
5 REM DUMP THE BASIC ROM TO CASSETTE
10 FOR X = 40960 TO 49151
20 WAIT 61440, 2
30 B = PEEK(X)
40 POKE 61441, B
50 NEXT
</pre>
</blockquote>
<p>
By recording the audio stream using Audicity on my Mac and decoding
the resulting WAV files using the Python scripts described in a
<a href="http://dabeaz.blogspot.com/2010/08/decoding-superboard-ii-cassette-audio.html">previous
post</a> I was able to capture both the 8K BASIC ROM and 2K system
ROM. I put these in files <tt><a
href="http://www.dabeaz.com/basic.bin">basic.bin</a></tt>
and <tt><a href="http://www.dabeaz.com/rom.bin">rom.bin</a></tt>.
</p>
<p>
Next up, you need to understand how the hardware devices work such as
the Video RAM, polled keyboard, and 6850 ACIA serial port. For
example, you need to wrap your brain around everything that is going on this figure:</p>
<p>
<center>
<img src="http://www.dabeaz.com/images/sb_poll.png"/>
</center>
</p>
<p>
Once you understand that, you're ready to make an emulation. To do
it, you need to address two basic problems. First, you need to load the captured ROM
images. That's the easy part. Next, you need to install observer functions on the memory
addresses mapped to different hardware devices and make those
functions immitate the actual hardware. That's the tricky bit.
</p>
<p>Here is an example of doing just that. The most notable part of
this code is found in the <tt>map_hardware()</tt> function that maps
functions to certain memory addresses. If you look at these functions,
you can see how they capture memory access and use that to emulate
hardware devices. Of course, figuring out all of the subtle details
of the Superboard II hardware is left as an exercise to the reader:
</p>
<blockquote>
<pre>
#!/usr/bin/env python3 -u
import py65.monitor
import sys
import select
# Write to a specific video address (using VT100 cursor control)
def video_output(address,value):
row = (address - 0xd000) // 32
column = address % 32
sys.stdout.write(('\x1b[7m\x1b[<%d>;<%d>H' % (row,column)) + chr(value) + '\x1b[0m')
sys.stdout.flush()
# Keyboard mapping table (for polled keyboard)
keymap = {
b'\x00' : {254:254, 253:255, 251:255, 247:255, 239:255, 223:255, 191:255, 127:255},
b'\r' : {254:254, 223:247},
b'\n' : {254:254, 223:247},
b' ' : {254:254, 253:239},
b'/' : {254:254, 253:247},
b';' : {254:254, 253:251},
b':' : {254:254, 191:239},
b'-' : {254:254, 191:247},
b'.' : {254:254, 223:127},
b',' : {254:254, 251:253},
b'A' : {254:254, 253:191},
b'B' : {254:254, 251:239},
b'C' : {254:254, 251:191},
b'D' : {254:254, 247:191},
b'E' : {254:254, 239:191},
b'F' : {254:254, 247:223},
b'G' : {254:254, 247:239},
b'H' : {254:254, 247:247},
b'I' : {254:254, 239:253},
b'J' : {254:254, 247:251},
b'K' : {254:254, 247:253},
b'L' : {254:254, 223:191},
b'M' : {254:254, 251:251},
b'N' : {254:254, 251:247},
b'O' : {254:254, 223:223},
b'P' : {254:254, 253:253},
b'Q' : {254:254, 253:127},
b'R' : {254:254, 239:223},
b'S' : {254:254, 247:127},
b'T' : {254:254, 239:239},
b'U' : {254:254, 239:251},
b'V' : {254:254, 251:223},
b'W' : {254:254, 239:127},
b'X' : {254:254, 251:127},
b'Y' : {254:254, 237:247},
b'Z' : {254:254, 253:223},
b'1' : {254:254, 127:127},
b'2' : {254:254, 127:191},
b'3' : {254:254, 127:223},
b'4' : {254:254, 127:239},
b'5' : {254:254, 127:247},
b'6' : {254:254, 127:251},
b'7' : {254:254, 127:253},
b'8' : {254:254, 191:127},
b'9' : {254:254, 191:191},
b'0' : {254:254, 191:223},
b'!' : {254:252, 127:127},
b'"' : {254:252, 127:191},
b'#' : {254:252, 127:223},
b'$' : {254:252, 127:239},
b'%' : {254:252, 127:247},
b'&' : {254:252, 127:251},
b"'" : {254:252, 127:254},
b'(' : {254:252, 191:127},
b')' : {254:252, 191:191},
b'*' : {254:252, 191:239},
b'=' : {254:252, 191:247},
b'>' : {254:252, 223:127},
b'<' : {254:252, 251:253},
b'?' : {254:252, 253:247},
b'+' : {254:252, 253:251},
}
# Raw file underlying stdin
raw_stdin = sys.stdin.buffer.raw
# State about what's being polled
kb_row = 0
kb_current = keymap[b'\x00']
kb_count = 0
# Read the row values for the polled row
def keyboard_read(address):
global kb_count, kb_current
if kb_count > 0:
kb_count -= 1
if kb_count < 5:
# Simulate key-release
kb_current = keymap[b'\x00']
else:
kb_current = keymap[b'\x00']
if kb_row == 254:
# Poll stdin to see any input
r,w,e = select.select([raw_stdin],[],[],0)
if r:
keyboard_press(raw_stdin.read(1))
return kb_current.get(kb_row,255)
# Set the current keyboard poll row
def keyboard_write(address, val):
global kb_row
kb_row = val
# Initiate a keypress
def keyboard_press(ch):
global kb_count, kb_current
if ch in keymap:
kb_current = keymap[ch]
kb_count = 30
def map_hardware(m):
# Video RAM at 0xd000-xd400
m.subscribe_to_write(range(0xd000,0xd400),video_output)
# Monitor the polled keyboard port
m.subscribe_to_read([0xdf00], keyboard_read)
m.subscribe_to_write([0xdf00], keyboard_write)
# Bad memory address to force end to memory check
m.subscribe_to_read([0x8000], lambda x: 0)
def main(args=None):
c = py65.monitor.Monitor()
map_hardware(c._mpu.memory)
try:
import readline
except ImportError:
pass
# Load the ROMs and boot
c.onecmd("load rom.bin f800")
c.onecmd("load basic.bin a000")
c.onecmd("goto ff00")
try:
c.onecmd('version')
c.cmdloop()
except KeyboardInterrupt:
c._output('')
if __name__ == "__main__":
main()
</pre>
</blockquote>
<p>
<b>Running the Emulation</b>
</p>
<p>
Running the emulation in a VT100 compatible terminal window, you'll
get output that looks like this. Yep, that's my Superboard II running
up in the upper left corner of the terminal window (click on the image
to see a video):
</p>
<center>
<a href="http://www.youtube.com/watch?v=unAKUE0fUnA"><img src="http://www.dabeaz.com/images/sb_emul.png"/></a>
</center>
<p>
Admittedly, it's kind of a hack, but then again, that's the whole point.
</p>
<p>
<b>Final Words</b>
</p>
<p>
I've put my modified Py65 code online at <a
href="http://github.com/dabeaz/py65">http://github.com/dabeaz/py65</a>.
The distribution also includes a slightly different emulation example
that allows you to telnet to an emulated Superboard.</p>
<p>
I gave a talk about this at the January 13, 2011 meeting of <a
href="http://chipy.org">Chipy</a>. Check out the <a href="http://carlfk.blip.tv/file/4639616">video</a>.Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-361900701146656042010-12-16T15:04:00.000-08:002010-12-16T15:04:32.363-08:00O'Reilly Python Cookbook: Python 3 All The Way<p>
I'm pleased to announce that Brian Jones and I have just signed on to be the editors/curators of the upcoming O'Reilly Python Cookbook, 3rd Edition--to appear sometime in late 2011. Brian has posted some <a href="http://www.protocolostomy.com/2010/12/16/good-things-come-in-threes-python-cookbook-third-edition/">details</a> on his blog, but let's just say that I'm really excited to be working on this project. I think it's going to be great!</p>
<p>
I've had both prior editions of the Cookbook in my library for some time--in fact, I wrote the section introduction for the chapter on "Extending and Embedding." One thing that I didn't remember until now was that my biographical sketch from the preface of the past edition included the following description:
</p>
<blockquote>
<em>"David Beazley is a fairly sick man (in a good way)"</em>
</blockquote>
<p>
I'm not sure who I have to thank for that, but I can say that Brian and I hope to put together the sickest, baddest, most useful Cookbook yet.</p>
<h3>Python 3 - All The Way</h3>
<p>
Yep. It's true. A major feature of the new edition will be an exclusive focus on Python 3. In fact, we simply won't include coverage of anything that doesn't work with it.</p>
<p>
Now, I know what you're thinking, this is going to result in the smallest Cookbook ever--coming in just slightly more than 25 pages. Wrong!</p>
<p>
There are all sorts of new and exciting things about Python 3 worth writing about. For example, did you know that quite a few past Cookbook recipes are now simply built-in features or one-line Python 3 statements? Moreover, Python 3 has all sorts of interesting new programming idioms--especially related to I/O handling, concurrency, metaprogramming, and more.</p>
<p>
Thus, one of our main goals is to present useful recipes that take full advantage of new idioms and which do things the "Python 3" way. In part, this will be welcome information for anyone who has decided to make Python 3 their primary programming environment. However, we also hope that having a useful set of idiomatic recipes will be useful to anyone who is thinking about porting code from Python 2.</p>
<p>
Of course, we obviously want to include useful recipes for modules that have already made the transition.</p>
<p>
<h3>We Want Your Help and Feedback</h3>
</p>
<p>
Past editions of the Cookbook have always been a community effort. The recipes themselves are drawn from submissions to the <a href="http://code.activestate.com/recipes/langs/python/">ActiveState Python Recipes</a> site and are fully attributed. In fact, the folks at ActiveState are an active participant in this project.</p>
<p>
As editors, Brian and I play a number of roles. First and foremost, we're simply going to work to put together a great set of recipes along with tests to make sure they work as advertised. However, we also have the job of soliciting feedback and guiding the overall project. As part of that, we'd really like to know more about what kinds of recipes to include. Specific programming techniques? More coverage of certain built-in libraries? Information on important third-party extensions? Everything is fair game.</p>
<p>
Throughout the project, you can contact us by sending email to 'PythonCookbook' at 'oreilly.com' or writing comments on our blog posts.
</p>
<h3>Stay Tuned</h3>
<p>
Throughout the project, Brian and I hope to blog about our progress.
You can also follow <a href="http://twitter.com/#!/bkjones">@bkjones</a> and <a href="http://twitter.com/#!/dabeaz">@dabeaz</a> on Twitter for updates. </p>
<p>
-- Dave
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-57657362666357430722010-12-04T14:46:00.001-08:002010-12-04T14:46:53.953-08:00Python Concurrency Workshop - 2011<p>Well, January in Chicago can only mean one thing--that my <a href="http://www.dabeaz.com/chicago/concurrent.html">Python Concurrency and Distributed Computing Workshop</a> is back! If you've wanted to learn more about concurrency, threads, messaging, and other related topics, then this is the workshop for you. There also promises to be a certain amount of insanity--after all, past editions of the workshop were responsible for my whole exploration into the <a href="http://www.dabeaz.com/GIL">GIL</a>.</p>
<p>
Unlike a normal Python course, the concurrency workshop is more experimental in nature--tending to focus on cutting edge topics and exploration of lesser-known areas of Python programming. However, no topic is off-limits as discussions might dive into facets of C programming, operating systems, other programming languages. Needless to say, a good time will be had by all.
</p>
<p>
The <a href="http://www.dabeaz.com/chicago/concurrent.html">course page</a> has detailed information on the previous workshop. This year, we'll cover much of that material, but here are some exciting new highlights for 2011:</p>
<ul>
<p>
<li><b>Python 3.</b> Want to know what Python 3 is all about? You'll find out in a big way as this year's workshop is entirely based on Python 3, preferably Python-3.2.</li>
</p>
<p>
<li><b>Messaging.</b> There will be significantly more material on messaging architectures. As part of that, we'll look in some depth at 0MQ, distributed key-value stores, actors, and more.</li>
</p>
<p>
<li><b>Mondo Threads.</b> A completely revised thread-programming section that will present Python thread programming as you've never seen it before. Prepare to be amazed.</li>
</p>
<p>
<li><b>Reliability.</b> There's a great deal of added information on software design and debugging techniques for reliable concurrent programming. </li>
</ul>
<p>
As usual, the course is strictly limited to 6 students and being held in Chicago's Andersonville neighborhood. Worried about the cold? Well, in this course, there are far more scary things to be worried about than that. Besides, the classroom is completely surrounded by coffee shops and places to get strong Belgian ales. The cold is going to be the least of your problems.</p>
<p>
Hopefully I'll see you in Chicago. It's going to be great!
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-37629174101099952462010-09-25T18:25:00.000-07:002010-09-25T18:25:55.576-07:00Putting all of my Past PyCon/IPC Presentations on Slideshare<p>For the past few years, I've been making my PyCon tutorials and presentations available online. For example, <a href="http://www.dabeaz.com/generators">Generator Tricks for Systems Programmers</a> from PyCon'2008, <a href="http://www.dabeaz.com/coroutines">A Curious Course on Coroutines and Concurrency</a> from PyCon'2009, and <a href="http://www.dabeaz.com/python3io">Mastering Python 3 I/O</a> from PyCon'2010. Although there have been many downloads, I've occasionally received requests to post material in a format more suitable for sharing online.</p>
<p>
Thus, I'm pleased to announce that I've set up a <a href="http://www.slideshare.net/dabeaz">Slideshare channel</a> that has the slides from almost all my past presentations and tutorials from PyCon, the International Python Conference, USENIX, and a few other conferences, going all the way back to 1996. All told, there are more than 1700 slides on Python programming, Swig, PLY, and other topics.</p>
<p>
I hope someone finds this material useful so enjoy! I'm still going through my presentation archive and will probably add even more to Slideshare as I find time.</p>
<p>
-- Dave
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-50396676720146139812010-09-15T09:05:00.000-07:002010-09-23T07:52:56.650-07:00A few good reasons to take one of my Fall 2010 Python courses<p>
This fall, I am offering three intense Python courses in Chicago:</p>
<p>
<ul>
<li><a href="http://www.dabeaz.com/chicago/practical.html">Practical Python Programming</a>, October 25-28, 2010.</li>
<li><a href="http://www.dabeaz.com/chicago/mastery.html">Advanced Python Mastery</a>,
November 8-11, 2010. (<b>Only two slots left!</b>)</li>
<li><a href="http://www.dabeaz.com/chicago/django.html">Practical Python Programing plus Django</a>, November 15-19, 2010.</li>
</ul>
<p>
Here are some reasons you might want to attend:</p>
<p>
<ol>
<li><b>Courses are held in a certifiably "evil" Python programming lair.</b> Aside from some occasional C and assembly hacking, this is where I do all of my Python programming. Want to take a class to go get "certified" in some kind of "enterprise" software or Microsoft Office? Bah. Better look elsewhere. Python is my only focus here.
<p>
<center>
<img src="http://www.dabeaz.com/chicago/class_small.jpg">
</center>
</p>
</li>
<p>
<li><b>Be like a rocket scientist.</b> These are the same Python classes I regularly teach on-site to scientists, engineers, and yes, rocket scientists--who think Python is pretty useful by the way. However, do you have to be an expert to attend? Nope. These courses are for anyone who wants to learn more--including programmers new to Python. </li></p>
<p>
<li><b>You'll learn some new tricks for making your code better.</b></li> Even if you've been programming in Python for awhile, you will learn some new techniques. This is because I spend most of my free time exploring different ways to effectively use Python's various features--often in preparation for future writing projects, PyCon tutorials, or for use in my own coding projects. And after you've mastered everything there is to know about Python, you can move on to mastering the <a href="http://www.dabeaz.com/chicago/curta.html">Curta</a>.</li>
</p>
<p>
<li><b>You'll be well fed.</b> These courses aren't held in some sterile hotel or corporate training center. The lair is surrounded by great restaurants, cafes, and bakeries. For instance, you probably don't want to know how many calories are in this picture (from the bakery located immediately below the lair):
<p>
<center>
<img src="http://www.dabeaz.com/chicago/pastry.jpg">
</center>
</p>
</li>
<p>
<li><b>All Python, All Day</b>. You're going to spend several days doing nothing but hacking and talking about Python with people who like Python as much as you do. What's not to like about that?</li>
</p>
</ol>
<p>
That is all for now. Hopefully you'll join me for a future course!</p>
<p>
--Dave
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-66493017255746661512010-09-04T09:26:00.000-07:002010-09-04T10:55:27.175-07:00Using telnet to access my Superboard II (via Python and cassette ports)<P>
Welcome to part 3 of my "Superboard II" trilogy. For the first two parts, see these posts:</p>
<ul>
<li><a href="http://dabeaz.blogspot.com/2010/08/using-python-to-encode-cassette.html">Using Python to Encode Cassette Recordings for my Superboard II</a>
<li><a href="http://dabeaz.blogspot.com/2010/08/decoding-superboard-ii-cassette-audio.html">Decoding Superboard II Cassette Audio Using Python 3, Two Generators, and a Deque</a>
</ul>
<p>
<center>
<img src="http://www.dabeaz.com/images/osi_small.jpg"><br>
<em>Dave's Superboard II</em>
</center>
<p>
First, a brief digression.
</p>
<p>
<b>Why Bother?</b>
</p>
<p>Aside from the obvious nostalgia (the Superboard II being my first computer), why bother messing around with something like this? After all, we're talking about a long-since-dead 1970s technology. Any sort of practical application certainly seems far-fetched.</p>
<p>
The simple answer is that doing this sort of thing is fun--fun for the same reasons I got into programming in the first place. When my family first got the Superboard, it was this magical device--a device where you could command it to do anything you wanted. You could write programs to make it play games. Or, more importantly, you could command it to do your math homework. Not only that, everything about the machine was open. It came with electrical schematics and memory maps. You could directly input hex 6502 opcodes. There were no rules at all. Although writing a game or doing your homework might be an end goal, the real fun was the process of figuring out how to do those things (to be honest, I think I learned much more about math by writing programs to do my math homework than I ever did by actually doing the homework, but that's a different story). </p>
<p>
Flash forward about 30 years and I'm now doing most of my coding in Python. However, Python (and most other dynamic languages) embody everything that was great about my old Superboard II. For instance, the instant gratification of using the interactive interpreter to try things out. Or, the complete freedom to do almost anything you want in a program (first-class functions, duck-typing, metaprogramming, etc.). Or, the ability to dig deep into the bowels of your system (ctypes, Swig, etc.). Frankly, it's all great fun. It's what programming should be about. Clearly the designers of more "serious" languages (especially those designed for the "enterprise") never had anything like a Superboard.</p>
<P>
Anyways, getting back to my motivations, I don't really have any urgent need to access my Superboard from my Mac. I'm mostly just interested in the problem of <em>how</em> I would do it. The fun is all in the process of figuring it out.</p>
<p>
<b>Back to the Superboard Cassette Ports</b></p>
<p>
Getting back to topic, you will recall that in my prior posts, I was interested in the problem of <a href="http://dabeaz.blogspot.com/2010/08/using-python-to-encode-cassette.html">encoding</a> and <a href="http://dabeaz.blogspot.com/2010/08/decoding-superboard-ii-cassette-audio.html">decoding</a> the audio stream transmitted from the cassette input and output ports on my Superboard II. In part, this was due to the fact that those are the only available I/O ports--forget about USB, Firewire, Ethernet, RS-232, or a parallel port. Nope, cassette audio is all there is.</p>
<p>
From the two parts, I wrote some Python scripts that <a href="http://www.dabeaz.com/kcs_encode.py">encode</a> and <a href="http://www.dabeaz.com/kcs_decode.py">decode</a> the cassette audio data to and from WAV files. Although that is somewhat interesting, working with WAV files was never my real goal. Instead, what I <em>really</em> wanted to do was to set up a real-time bidirectional data communication channel between my Mac and the Superboard II. Simply stated, I wanted to create the equivalent of a network connection using the cassette ports. Would it even be possible? Who knows?</p>
<p>
So far as I know, the cassette ports on the Superboard were never intended for this purpose. Although there are commands to save a program and to load a program, driving both the cassette input and output simultaneously isn't something you would do. It didn't even make any sense. There certainly weren't any Superboard commands to do that.
</p>
<p>
<b>Building a Soft-Modem Using PyAudio</b>
</p>
<p>
To perform real-time communications, the Superboard needs to be connected to both the audio line-out and line-in ports of my Mac. Using those connections, I would then need to write a program that operates as a soft-modem. This program would simultaneously read and transmit audio data by encoding or decoding it as appropriate (see my past posts).</p>
<p>
I've never written a program for manipulating audio on my Mac, but after some searching, I found the <a href="http://people.csail.mit.edu/hubert/pyaudio/">PyAudio</a> extension that seemed to provide the exact set of features I needed.
</p>
<p>
To create a soft-modem, I defined reader and writer threads as follows:
</p>
<blockquote>
<pre>
# Note : This is Python 2 due to the PyAudio dependency.
import pyaudio
import kcs_decode # See prior posts
import kcs_encode # See prior posts
from Queue import Queue
FORMAT = pyaudio.paInt8
CHANNELS = 1
RATE = 9600
CHUNKSIZE = 1024
# Buffered data received and waiting to transmit
audio_write_buffer = Queue()
audio_read_buffer = Queue()
# Generate a sequence representing sign change bits on the real-time
# audio stream (needed as input for decoding)
def generate_sign_change_bits(stream):
previous = 0
while True:
frames = stream.read(CHUNKSIZE)
if not frames:
break
msbytes = bytearray(frames)
# Emit a stream of sign-change bits
for byte in msbytes:
signbit = byte & 0x80
yield 1 if (signbit ^ previous) else 0
previous = signbit
# Thread that reads and decodes KCS audio input
def audio_reader():
print("Reader starting")
p = pyaudio.PyAudio()
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input=True,
frames_per_buffer=CHUNKSIZE)
bits = generate_sign_change_bits(stream)
byte_stream = kcs_decode.generate_bytes(bits, RATE)
for b in byte_stream:
audio_read_buffer.put(chr(b))
# Thread that writes KCS audio data
def audio_writer():
print("Writer starting")
p = pyaudio.PyAudio()
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
output=True)
while True:
if not audio_write_buffer.empty():
msg = kcs_encode.kcs_encode_byte(ord(audio_write_buffer.get()))
stream.write(buffer(msg))
else:
stream.write(buffer(kcs_encode.one_pulse))
if __name__ == '__main__':
import threading
# Launch the reader/writer threads
reader_thr = threading.Thread(target=audio_reader)
reader_thr.daemon = True
reader_thr.name = "Reader"
reader_thr.start()
writer_thr = threading.Thread(target=audio_writer)
writer_thr.daemon = True
writer_thr.name = "Writer"
writer_thr.start()
</pre>
</blockquote>
<p>
The operation of this code is relatively straightforward. There is a reader thread that constantly samples audio on the line-in port and decodes it into bytes which are stored in a
queue for later consumption. There is a writer thread that encodes and transmits outgoing data (if any). If there is no data, the writer transmits a constant carrier tone on the line out (a 2400 Hz wave).</p>
<p>
These threads operate entirely in the background. To read data from the Superboard, you simply check the contents of the audio read buffer. To send data to the Superboard, you simply append outgoing data to the audio write buffer.</p>
<p>
<b>Creating a Network Server</b>
</p>
<p>
To tie all of this together, you can now write a network server that connects the real-time audio streams to a network socket. This can be done by defining a third thread like this:
</p>
<blockquote>
<pre>
import socket
import time
def server(addr):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,1)
s.bind(addr)
s.listen(1)
print("Server running on", addr)
# Wait for the client to connect
while True:
c,a = s.accept()
print("Got connection",a)
c.setblocking(False)
try:
# Enter a loop where we try to transmit data back and forth between the client and the audio stream
while True:
# Check for incoming data
try:
indata = c.recv(8192)
if not indata:
raise EOFError()
indata = indata.replace(b'\r',b'\r' + b'\x00'*20)
for b in indata:
audio_write_buffer.put(b)
except socket.error:
pass
# Check if there is any outgoing data to transmit (try to send it all)
if not audio_read_buffer.empty():
while not audio_read_buffer.empty():
b = audio_read_buffer.get()
c.send(b)
else:
# Sleep briefly if nothing is going on. This is fine, the max
# data transfer rate of the Superboard is 300 baud
time.sleep(0.01)
except EOFError:
print("Connection closed")
c.close()
if __name__ == '__main__':
import threading
# Launch the reader/writer threads
... see above code ..
# Launch the network server
server_thr = threading.Thread(target=server,args=(("",15000),))
server_thr.daemon = True
server_thr.name = "Server"
server_thr.start()
# Have the main thread do something (so Ctrl-C works)
while True:
time.sleep(1)
</pre>
</blockquote>
<p>
This server operates as a simple polling loop over a client socket and the incoming audio data stream. Data received on the socket is placed in the write buffer used by the audio writer thread. Data received by the audio reader is send back to the client. This code could probably be cleaned up through the use of the <tt>select()</tt> call, but I frankly don't know if <tt>select()</tt> works with PyAudio and didn't investigate it. Given that the maximum data rate of the Superboard is 300 baud, a "good enough" solution seemed to be just that.</p>
<p>
<b>Putting it to the Test</b>
</p>
<p>
Now, the ultimate test--does it actually work? To try it out, you first have to launch the above audio server process. For example:</p>
<blockquote>
<pre>
bash % <b>python audioserv.py</b>
Reader starting
Writer starting
Server running on ('', 15000)
</pre>
</blockquote>
<p>
Next, make sure the Superboard II is plugged into the line-in and line-out ports on my Mac. On the Superboard, I had to manually type two <tt>POKE</tt> statements to make it send all output to the cassette output and to read all keyboard input from the cassette input.</p>
<p>
<blockquote>
<pre>
POKE 517, 128
POKE 515, 128
</pre>
</blockquote>
<p>
Finally, use the <tt>telnet</tt> command to connect to the audio server.</p>
<blockquote>
<pre>
bash $ <b>telnet localhost 15000</b>
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
telnet> <b>mode character</b>
<b>LIST</b>
OK
<b>PRINT "HELLO WORLD"</b>
HELLO WORLD
OK
</pre>
</blockquote>
<p>
Excellent! It seems to be working. It's a little hard to appreciate with just a screenshot. Therefore, you can check out the following <a href="http://www.youtube.com/watch?v=FMGG33IHg_4">movie</a> that shows it all in action:
</p>
<center>
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/FMGG33IHg_4?hl=en&fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/FMGG33IHg_4?hl=en&fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object>
</center>
<p>
Again, it's important to emphasize that there is no other connection between the two machines other than a pair of audio cables.</p>
<p>
<b>That is all (for now)</b>
</p>
<p>
Well, there you have it--using Python to implement a soft-modem that encodes/decodes cassette audio data in real-time, allowing me to remotely access my old Superboard using telnet. At last, I can write old Microsoft Basic 1.0 programs from the comfort of my Aeron chair and a 23-inch LCD display--and there's nothing old-school about that!</p>
<p>
Hope you enjoyed this series of posts. Sadly, it's now time to get back to some "real work." Of course, if you'd like to see all of this in person, you should sign up for one of my <a href="http://www.dabeaz.com/chicago/index.html">Python courses</a>.
</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.comtag:blogger.com,1999:blog-36456651.post-22124205454710361462010-08-29T20:39:00.000-07:002010-08-30T07:33:27.945-07:00Decoding Superboard II Cassette Audio Using Python 3, Two Generators, and a Deque<p>Welcome to the second installment of using Python to encode/decode cassette audio data for use with my resurrected Superboard II system. Last time, I talked about the problem of <a
href="http://dabeaz.blogspot.com/2010/08/using-python-to-encode-cassette.html">encoding text files into WAV audio files</a> for uploading via the Superboard cassette input. In this post, I explore the opposite problem--namely using Python to decode WAV audio files recorded from the cassette output port back into the transmitted byte stream--in essence, writing a Python script that performs the same function as a modem.</p><center><br />
<img src="http://www.dabeaz.com/images/osi_back.jpg"><br />
<br />
<em>The cassette ports of my Superboard II</em><br />
</center><br />
<p>Although decoding audio data from the cassette output sounds like it might be a tricky exercise involving sophisticated signal processing (e.g., FFTs), it turns out that you can easily solve this problem using nothing more than a few built-in objects (bytearrays, deques, etc.) and a couple of simple generator functions. In fact, it's a neat exercise involving some of the lesser known, but quite useful data processing features of Python. Plus, it seems like a good excuse to further bang on the new <a href="http://www.dabeaz.com/python3io">Python 3 I/O system</a>. So, let's get started.</p><p><b>Audio Format</b><br />
</p><p>In my <a href="http://dabeaz.blogspot.com/2010/08/using-python-to-encode-cassette.html">earlier post</a>, I described how the format used for cassette recordings is the <a href="http://en.wikipedia.org/wiki/Kansas_City_standard">Kansas City Standard</a> (KCS). The encoding is really simple--8 cycles at 2400 HZ represent a 1-bit and 4 cycles at 1200 HZ represent a 0-bit. Individual bytes are encoded with 1 start bit (0) and two stop bits (1s). Here is a plot that shows some waveforms from a fragment of <a href="http://www.dabeaz.com/images/osi_sample.wav">recorded audio</a>.</p><center><br />
<img src="http://www.dabeaz.com/images/osi_wave_small.png"><br />
</center><br />
<p>It is important to stress that this encoding is intentionally simple--designed to operate on systems of its era (1970s) and to be resistant to all sorts of problems associated with cassette tapes. For example, noise, low-fidelity, variations in tape playback speed, etc. Needless to say, it's not especially fast. Encoding a single byte of data requires 11 bits or 88 cycles of a 2400 HZ wave. If you do the math, that works out to about 27 bytes per second or 300 baud.</p><p><b>A Decoding Strategy (Big Picture)</b><br />
</p><p>KCS decoding is almost entirely based on counting cycles of two different wave frequencies. That is, to decode the data we simply sample the audio data and count the number of zero-crossings. At a high level, decoding a single bit works as follows:</p><ul><li>Read a sample of N audio frames where N represents the number of frames required to represent an entire bit (8 cycles at 2400 Hz).</li>
<li>Count the number of zero crossings found in the sample.</li>
<li>If the the number of crossings is near 16, then it represents a 1.</li>
<li>If the number of crossings is near 8, then it represents a 0.<br />
</ul><p>From bits, it's relatively simple to make the transition to bytes. You simply have to recognize the start bit and sample the next 8 bits as data bits to form a byte.</p><p><b>Deconstructing a WAV File to Sign Bits</b> </p><p>Python has a module <a href="http://docs.python.org/library/wave.html"><tt>wave</tt></a> that can be used to read WAV files. Here is an example of opening a WAV file and obtaining some useful metadata about the recorded audio.</p><blockquote><pre>>>> <b>import wave</b>
>>> <b>wf = wave.open("osi_sample.wav")</b>
>>> <b>wf.getnchannels()</b>
2
>>> <b>wf.getsampwidth()</b>
2
>>> <b>wf.getframerate()</b>
44100
>>>
</pre></blockquote><p>In the above example, the WAV file is a 44100Hz stereo recording using 16-bit (2 byte) samples.</p><p>For our decoding, we are only interested in counting the number of zero-crossings in the audio data. For a 16-bit WAV file, the "zero" is represented by a sample value of 2**15 (32768). A "positive" wave sample has a value greater than 2**15 whereas a "negative" wave sample has a value less than that. Conveniently, this determination can be made by simply stripping all sample data away except for the most significant bit.</p><p>Here is a generator function that takes a sequence of WAV audio data and reduces it to a sequence of sign bits.</p><blockquote><pre># Generate a sequence representing sign bits
def generate_wav_sign_bits(wavefile):
samplewidth = wavefile.getsampwidth()
nchannels = wavefile.getnchannels()
while True:
frames = wavefile.readframes(8192)
if not frames:
break
# Extract most significant bytes from left-most audio channel
msbytes = bytearray(frames[samplewidth-1::samplewidth*nchannels])
# Emit a stream of sign bits
for byte in msbytes:
yield 1 if (byte & 0x80) else 0
</pre></blockquote><p>This generator works by reading a chunk of raw audio frames and using an extended slice <tt>frames[samplewidth-1::samplewidth*nchannels]</tt> to extract the most significant byte from each sample of the left-most audio channel. The result is placed into a <tt>bytearray</tt> object. A <tt>bytearray</tt> stores a sequence of bytes (like a string), but has the nice property that the stored data is presented as integers instead of 1-character strings. This makes it easy to perform numeric calculations on the data. The <tt>yield 1 if (byte & 0x80) else 0</tt> simply yields the most significant bit of each byte.</p><p>The resulting output from this generator is simply a sequence of sign bits. For example, the output will look similar to this:</p><blockquote><pre>>>> <b>import wave</b>
>>> <b>wf = wave.open("sample.wav")</b>
>>> <b>for bit in generate_wav_sign_bits(wf):</b>
... <b>print(bit,end="")</b>
...
11111111000000000111111111000000000011111111100000000011111111110000000001111111
11000000000011111111100000000011111111110000000001111111110000000000111111111000
00000011111111110000000001111111110000000000111111111000000000111111111100000000
01111111110000000000111111111000000000111111111100000000011111111100000000001111
...
</pre></blockquote><p><b>From Sign Bits to Sign Changes</b> </p><p>Although a sequence of wave sign bits is interesting, it's not really that useful. Instead, we're really more interested in zero-crossings or samples where the sign changes. Getting this information is actually pretty easy--simply compute the exclusive-or (XOR) of successive sign bits. If you do this, you will always get 0 when the sign stays the same or a value 0x80 when the sign flips. Here is a modified version of our generator function.</p><blockquote><pre># Generate a sequence representing changes in sign
def generate_wav_sign_change_bits(wavefile):
samplewidth = wavefile.getsampwidth()
nchannels = wavefile.getnchannels()
previous = 0
while True:
frames = wavefile.readframes(8192)
if not frames:
break
# Extract most significant bytes from left-most audio channel
msbytes = bytearray(frames[samplewidth-1::samplewidth*nchannels])
# Emit a stream of sign-change bits
for byte in msbytes:
signbit = byte & 0x80
yield 1 if (signbit ^ previous) else 0
previous = signbit
</pre></blockquote><p>This slightly modified generator now produces a sequence of data with sign change pulses in it similar to this:</p><blockquote><pre>>>> <b>import wave</b>
>>> <b>wf = wave.open("sample.wav")</b>
>>> <b>for bit in generate_wav_sign_change_bits(wf):</b>
... <b>print(bit,end="")</b>
...
00000000100000000100000000100000000010000000010000000010000000001000000001000000
00100000000010000000010000000010000000001000000001000000001000000000100000000100
00000010000000001000000001000000001000000000100000000100000000100000000010000000
01000000001000000000100000000100000000100000000010000000010000000010000000001000
...
</pre></blockquote><p><b>Bit Sampling</b> </p><p>At this point, the WAV file has been deconstructed into a sequence of sign changes. Now, all we have to do is sample the data and count the number of sign changes. To do this, use a <tt>deque</tt> and some clever iterator tricks. Here is some code:</p><blockquote><pre>from itertools import islice
from collections import deque
# Base frequency (representing a 1)
BASE_FREQ = 2400
# Generate a sequence of data bytes by sampling the stream of sign change bits
def generate_bytes(bitstream,framerate):
bitmasks = [0x1,0x2,0x4,0x8,0x10,0x20,0x40,0x80]
# Compute the number of audio frames used to encode a single data bit
frames_per_bit = int(round(float(framerate)*8/BASE_FREQ))
# Queue of sampled sign bits
sample = deque(maxlen=frames_per_bit)
# Fill the sample buffer with an initial set of data
sample.extend(islice(bitstream,frames_per_bit-1))
sign_changes = sum(sample)
# Look for the start bit
for val in bitstream:
if val:
sign_changes += 1
if sample.popleft():
sign_changes -= 1
sample.append(val)
# If a start bit detected, sample the next 8 data bits
if sign_changes <= 9:
byteval = 0
for mask in bitmasks:
if sum(islice(bitstream,frames_per_bit)) >= 12:
byteval |= mask
yield byteval
# Skip the final two stop bits and refill the sample buffer
sample.extend(islice(bitstream,2*frames_per_bit,3*frames_per_bit-1))
sign_changes = sum(sample)
</pre></blockquote><p>This code might require some study, but the concept is simple. A sample <tt>deque</tt> (the <tt>sample</tt> variable) is created, the size of which corresponds to the number of audio frames needed to represent a single data bit. It might be a little known fact, but if you create a <tt>deque</tt> with a <tt>maxlen</tt> setting, it turns into a kind of shift register. That is, new items added at the end will automatically cause old items to fall off the front if the length is exceeded. It is also very fast.</p><p>Getting back to our algorithm, audio data is pushed into this deque and the number of sign changes updated. If no data is being transmitted, the number of sign changes in the sample will hover around 16. However, if a start-bit is encountered, the number of sign changes in the sample will drop to around 8. In our code, this is detected by checking for 9 or fewer sign changes in the sample. Keep in mind that we don't really know when the start bit will appear--thus, the code proceeds frame-by-frame until the number of sign changes drops to a sufficiently low value. Once the start bit is detected, data bits are quickly sampled, one after the other, to form a complete byte. After the data bits are sampled, the two stop bits are skipped and the sample buffer refilled with the next potential start bit. </p><p><b>Does it Work?</b> </p><p>Hell yes it works. Here is a short test script that ties it all together: </p><blockquote><pre>if __name__ == '__main__':
import wave
import sys
if len(sys.argv) != 2:
print("Usage: %s infile" % sys.argv[0],file=sys.stderr)
raise SystemExit(1)
wf = wave.open(sys.argv[1])
sign_changes = generate_wav_sign_change_bits(wf)
byte_stream = generate_bytes(sign_changes, wf.getframerate())
# Output the byte stream
outf = sys.stdout.buffer.raw
while True:
buffer = bytes(islice(byte_stream,80))
if not buffer:
break
outf.write(buffer)
</pre></blockquote><p>If we run this program on the <tt><a href="http://www.dabeaz.com/images/osi_sample.wav">osi_sample.wav</a></tt> file, we get the following output (which is exactly what it should be):</p><blockquote><pre>bash-3.2$ <b>python3 kcs_decode.py osi_sample.wav</b>
10 FOR I = 1 TO 1000
20 PRINT I;
30 NEXT I
40 END
OK
bash-3.2$
</pre></blockquote><p>That's pretty nice--two relatively simple generator functions and some basic data manipulation on deques has turned the audio stream into a stream of bytes.</p><p>One thing that's not shown above is the embedded NULLs related to newline handling. You can see them if you do this:</p><blockquote><pre>bash-3.2$ <b>python3 kcs_decode.py osi_sample.wav | cat -e</b>
^M^@^@^@^@^@^@^@^@^@^@$
^M^@^@^@^@^@^@^@^@^@^@$
10 FOR I = 1 TO 1000^M^@^@^@^@^@^@^@^@^@^@$
20 PRINT I;^M^@^@^@^@^@^@^@^@^@^@$
30 NEXT I^M^@^@^@^@^@^@^@^@^@^@$
40 END^M^@^@^@^@^@^@^@^@^@^@$
OK^M^@^@^@^@^@^@^@^@^@^@$
bash-3.2$
</pre></blockquote><p><b>How well does it work?</b> </p><p>To test this decoding process, I recorded various audio samples directly from my Superboard using Audacity on my Mac. I used different sampling frequencies ranging from 8000 Hz to 48000 Hz. For all of the samples, the decoding process worked exactly as expected, producing no observable decoding errors. </p><p>Decoding 5788 bytes of transmitted test data from 47 Mbyte WAV file of 48 KHz stereo samples takes about 5.7 seconds on my Macbook (2.4 Ghz Intel Core Duo) for a baud rate of about 11000--more than 35 times faster than the Superboard can actually send it. Decoding the same data recorded in a 7.3 Mbyte WAV file with 8 KHz stereo samples takes about 0.97 seconds for a baud rate of about 65000 (Note: these baud rates are based on 11 bits of encoding for every data byte).</p><p>Although I could work to make the script run faster, it is already plenty fast for my purposes. Moreover, the generator-based approach means that they really aren't limited by the size of the input WAV files.</p><p><b>Final Words</b> </p><p>If you are interested in the final script, you can find it in the file <a href="http://www.dabeaz.com/kcs_decode.py">kcs_decode.py</a>. Although I've now written scripts to encode and decode Superboard II cassette audio data, this is the hardly the last word. Stay tuned (evil wink ;-). </p><p><b>Footnote</b> </p><p>If you're going to try any of this code, make sure you're using Python-3.1.2 or newer. Earlier versions of Python 3 seem to have buggy versions of the <tt>wave</tt> module.</p>Dave Beazleyhttp://www.blogger.com/profile/02802905126181462140noreply@blogger.com