When I first saw the new Python 3 bytearray object (also back-ported to Python 2.6), I wasn't exactly sure what to make of it. On the surface, it seemed like a kind of mutable 8-bit string (a feature sometimes requested by users of Python 2). For example:
>>> s = bytearray(b"Hello World") >>> s[:5] = b"Cruel" >>> s bytearray(b'Cruel World') >>>
On the other hand, there are aspects of bytearray objects that are completely unlike a string. For example, if you iterate over a bytearray, you get integer byte values:
>>> s = bytearray(b"Hello World") >>> for c in s: print(c) ... 72 101 108 108 111 32 87 111 114 108 100 >>>
Similarly, indexing operations are tied to integers:
>>> s[1] 101 >>> s[1] = 97 >>> s[1] = b'a' Traceback (most recent call last): File "", line 1, in TypeError: an integer is required >>>
Finally, there's the fact bytearray instances have most of the methods associated with strings as well as methods associated with lists. For example:
>>> s.split() [bytearray(b'Hello'), bytearray(b'World')] >>> s.append(33) >>> s bytearray(b'Hello World!') >>>
Although documentation on bytearrays describes these features, it is a little light on meaningful use cases. Needless to say, if you have too much spare time (sic) on your hands, this is the kind of thing that you start to think about. So, I thought I'd share three practical uses of bytearrays.
Example 1: Assembling a message from fragments
Suppose you're writing some network code that is receiving a large message on a socket connection. If you know about sockets, you know that the recv() operation doesn't wait for all of the data to arrive. Instead, it merely returns what's currently available in the system buffers. Therefore, to get all of the data, you might write code that looks like this:
# remaining = number of bytes being received (determined already) msg = b"" while remaining > 0: chunk = s.recv(remaining) # Get available data msg += chunk # Add it to the message remaining -= len(chunk)
The only problem with this code is that concatenation (+=) has horrible performance. Therefore, a common performance optimization in Python 2 is to collect all of the chunks in a list and perform a join when you're done. Like this:
# remaining = number of bytes being received (determined already) msgparts = [] while remaining > 0: chunk = s.recv(remaining) # Get available data msgparts.append(chunk) # Add it to list of chunks remaining -= len(chunk) msg = b"".join(msgparts) # Make the final message
Now, here's a third solution using a bytearray:
# remaining = number of bytes being received (determined already) msg = bytearray() while remaining > 0: chunk = s.recv(remaining) # Get available data msg.extend(chunk) # Add to message remaining -= len(chunk)
Notice how the bytearray version is really clean. You don't collect parts in a list and you don't perform that cryptic join at the end. Nice.
Of course, the big question is whether or not it performs. To test this out, I first made a list of small byte fragments like this:
chunks = [b"x"*16]*512
I then used the timeit module to compare the following two code fragments:
# Version 1 msgparts = [] for chunk in chunks: msgparts.append(chunk) msg = b"".join(msgparts) # Version 2 msg = bytearray() for chunk in chunks: msg.extend(chunk)
When tested, version 1 of the code ran in 99.8s whereas version 2 ran in 116.6s (a version using += concatenation takes 230.3s by comparison). So while performing a join operation is still faster, it's only faster by about 16%. Personally, I think the cleaner programming of the bytearray version might make up for it.
Example 2: Binary record packing
This example is an slight twist on the last example. Support you had a large Python list of integer (x,y) coordinates. Something like this:
points = [(1,2),(3,4),(9,10),(23,14),(50,90),...]
Now, suppose you need to write that data out as a binary encoded file consisting of a 32-bit integer length followed by each point packed into a pair of 32-bit integers. One way to do it would be to use the struct module like this:
import struct f = open("points.bin","wb") f.write(struct.pack("I",len(points))) for x,y in points: f.write(struct.pack("II",x,y)) f.close()
The only problem with this code is that it performs a large number of small write() operations. An alternative approach is to pack everything into a bytearray and only perform one write at the end. For example:
import struct f = open("points.bin","wb") msg = bytearray() msg.extend(struct.pack("I",len(points)) for x,y in points: msg.extend(struct.pack("II",x,y)) f.write(msg) f.close()
Sure enough, the version that uses bytearray runs much faster. In a simple timing test involving a list of 100000 points, it runs in about half the time as the version that makes a lot of small writes.
Example 3: Mathematical processing of byte values
The fact that bytearrays present themselves as arrays of integers makes it easier to perform certain kinds of calculations. In a recent embedded systems project, I was using Python to communicate with a device over a serial port. As part of the communications protocol, all messages had to be signed with a Longitudinal Redundancy Check (LRC) byte. An LRC is computed by taking an XOR across all of the byte values.
Bytearrays make such calculations easy. Here's one version:
message = bytearray(...) # Message already created lrc = 0 for b in message: lrc ^= b message.append(lrc) # Add to the end of the message
Here's a version that increases your job security:
message.append(functools.reduce(lambda x,y:x^y,message))
And here's the same calculation in Python 2 without bytearrays:
message = "..." # Message already created lrc = 0 for b in message: lrc ^= ord(b) message += chr(lrc) # Add the LRC byte
Personally, I like the bytearray version. There's no need to use ord() and you can just append the result at the end of the message instead of using concatenation.
Here's another cute example. Suppose you wanted to run a bytearray through a simple XOR-cipher. Here's a one-liner to do it:
>>> key = 37 >>> message = bytearray(b"Hello World") >>> s = bytearray(x ^ key for x in message) >>> s bytearray(b'm@IIJ\x05rJWIA') >>> bytearray(x ^ key for x in s) bytearray(b"Hello World") >>>
Final Comments
Although some programmers might focus on bytearrays as a kind of mutable string, I find their use as an efficient means for assembling messages from fragments to be much more interesting. That's because this kind of problem comes up a lot in the context of interprocess communication, networking, distributed computing, and other related areas. Thus, if you know about bytearrays, it might lead to code that has good performance and is easy to understand.
That's it for this installment. In case you're wondering, this topic is also related to my upcoming PyCON'2010 tutorial "Mastering Python 3 I/O."
Note: Since I first posted this, I added a performance test using the Python 2.6.4 codecs module. This addition is highlighted in red.
When Python 3.0 was first released, I tried it out on a few things and walked away unimpressed. By far, the big negative was the horrible I/O performance. For instance, scripts to perform simple data analysis tasks like processing a web server log file were running more than 30 times slower than Python 2. Even though there were many new features of Python 3 to be excited about, the I/O performance alone was enough to make me not want to use it---or recommend it to anyone else for that matter.
Some time has passed since then. For example, Python-3.1.1 is out and many improvements have been made. To force myself to better understand the new Python 3 I/O system, I've been working on a tutorial Mastering Python 3 I/O for the upcoming PyCON'2010 conference in Atlanta. Overall, I have to say that I'm pretty impressed with what I've found--and not just in terms of improved performance.
Due to space constraints, I can't talk about everything in my tutorial here. However, I thought I would share some thoughts about text-based I/O in Python 3.1 and discuss a few examples. Just as a disclaimer, I show a few benchmarks, but my intent is not to do a full study of every possible aspect of text I/O handling. I would strongly advise you to download Python 3.1.1 and perform your own tests to get a better feel for it.
Like many people, one of my main uses of Python is data processing and parsing. For example, consider the contents of a typical Apache web server log:
75.54.118.139 - - [24/Feb/2008:00:15:42 -0600] "GET /favicon.ico HTTP/1.1" 404 133 75.54.118.139 - - [24/Feb/2008:00:15:49 -0600] "GET /software.html HTTP/1.1" 200 3163 75.54.118.139 - - [24/Feb/2008:00:16:10 -0600] "GET /ply/index.html HTTP/1.1" 200 8018 213.145.165.82 - - [24/Feb/2008:00:16:19 -0600] "GET /ply/ HTTP/1.1" 200 8018 ...
Let's look at a simple script that processes this file. For example, suppose you wanted to produce a list of all URLs that have generated a 404 error. Here's a really simple (albeit hacky) script that does that:
error_404_urls = set() for line in open("access-log"): fields = line.split() if fields[-2] == '404': error_404_urls.add(fields[-4]) for name in error_404_urls: print(name)
On my machine, I have a 325MB log file consisting of 3649000 lines--a perfect candidate for performing a few benchmarks. Here are the numbers that you get running the above script with different Python versions. UCS-2 refers to Python compiled with 16-bit Unicode characters. UCS-4 refers to Python compiled with 32-bit Unicode characters (the --with-wide-unicode configuration option). Also, in the interest of full disclosure, these tests were performed with a warm disk cache on a 2 GHZ Intel Core 2 Duo Apple Macbook with 4GB of memory under OS-X 10.6.2 (Snow Leopard).
Python Version Time (seconds) 2.6.4 7.91s 3.0 125.42s 3.1.1 (UCS-2) 14.11s 3.1.1 (UCs-4) 17.32s
As you can see, Python 3.0 performance was an anomaly--the performance of Python 3.1.1 is substantially better. To better understand the I/O component of this script, I ran a modified test with the following code
for line in open("access-log"): pass
Here are the performance results for iterating over the file by lines:
Python Version Time (seconds) 2.6.4 1.50s 2.6.4 (codecs, UTF-8) 52.22s 3.0 105.87s 3.1.1 (UCS-2) 4.35s 3.1.1 (UCs-4) 6.11s
If you look at these numbers, you will see that the I/O performance of Python 3.1 has improved substantially. It is also substantially faster than using the codecs module in Python 2.6. However, you'll also observe that the performance is still quite a bit worse than the native Python 2.6 file object. For example, in the table, iterating over lines is about 3x slower in Python 3.1.1 (UCS-2). How can that be good? That's 300% slower!
Let's talk about the numbers in more detail. The decreased performance in Python 3 is almost solely due to the overhead of the underlying Unicode conversion applied to text input. That conversion process involves two distinct steps:
The overhead of decoding is a direct function of how complicated the underlying codec is. Although UTF-8 is relatively simple, it's still more complex than an encoding such as Latin-1. Let's see what happens if we try reading the file with "latin-1" encoding instead. Here's the modified test code:
for line in open("access-log",encoding='latin-1'): pass
Here are the modified performance results that show an improvement:
Python Version Time (seconds) 3.1.1 (UCS-2) 3.64s (was 4.35s) 3.1.1 (UCs-4) 5.33s (was 6.11s)
Lesson learned : The encoding matters. So, if you're working purely with ASCII text, specifying an encoding such as 'latin-1' will speed everything up. Just so you know, if you specify 'ascii' encoding, you get no improvement over UTF-8. This is because 'ascii' requires more work to decode than 'latin-1' (due to an extra check for bytes outside the range 0-127 in the decoding process).
At this point, you're still saying that it's slower. Yes, even with a faster encoding, Python 3.1.1 is still about 2.5x slower than Python 2.6.4 on this simple I/O test. Is there anything that can be done about that?
The short answer is "not really." Since Python 3 strings are Unicode, the process of reading a simple 8-bit text file is always going to involve an extra process of converting and copying the byte-oriented data into the multibyte Unicode representation. Just to give you an idea, let's drop into C programming and consider the following program:
#include <stdio.h> int main() { FILE *f; char bytes[256]; f = fopen("access-log","r"); while (fgets(bytes,256,f)) { // Yes, hacky } }
This program does nothing more than iterate over lines of a file--think of it as the ultimate stripped down version of our Python-2.6.4 test. If you run it, takes 1.13s to run on the same log file used for our earlier Python tests.
When you go to Python 3, there is always extra conversion. It's like modifying the C program as follows:
#include <stdio.h> int main() { FILE *f; char bytes[256], *c; short unicode[256], *u; f = fopen("biglog.txt","r"); while (fgets(bytes,256,f)) { c = bytes; u = unicode; while (*c) { /* Convert to Unicode */ *(u++) = (short) *(c++); } } }
Sure enough, if you run this modified C program, it takes about 1.7 seconds--a nearly 50% performance hit just from that extra copying and conversion step. Minimally, Python 3 has to do the same conversion. However, it's also performing dynamic memory allocation, reference counting, and other low-level operations. So, if you factor all of that in, the performance numbers start to make a little more sense. You also start to understand why it might be really hard to do much better.
Now, should you care about all of this? Truthfully, most programs are probably not going to be affected by degraded text I/O performance as much as you think. That's because most interesting programs do far more than just I/O. Go back and consider the original script that I presented. On Python-2.6.4, it took 7.91s to execute. If I go back and tune the script to use the more efficient 'latin-1' encoding, it takes 13.8s with Python-3.1.1. Yes, that's about 1.75x slower than before. However, the key point is that it's not 2.5x slower as our earlier I/O tests would suggest. The performance impact will become less and less as the script performs more non-IO related work.
Finally, let's say that you still can't live with the performance degradation. If you're just working with simple ASCII data files, you might solve this problem by turning to binary I/O instead. For example, the following script variant uses binary I/O and bytes for most of its processing--only converting text to Unicode when absolutely necessary for printing.
error_404_urls = set() for line in open("access-log","rb"): fields = line.split() if fields[-2] == b'404': error_404_urls.add(fields[-4]) for name in error_404_urls: print(name.decode('latin-1'))
If you run this final script, you find that it takes 8.22s in Python 3.1.1--which is only about 4% slower than the Python-2.6.4. How about that!
The bottom line is that Python-3.1 is definitely worth a second look--especially if you tried the earlier Python 3.0 release and were disappointed with its performance. Although text-based I/O is always going to be slower in Python 3 due to extra Unicode processing, it might not matter as much in practice. Plus, binary I/O in Python 3 is still quite fast which means that you can turn to it as a last resort.
If you want to know more, attend my Mastering Python 3 I/O at PyCON'2010 or sign up for the Special Preview in Chicago.
Final Notes:
import time start = time.time() ... statements ... end = time.time() print(end-start)
In preparation for my upcoming PyCON'2010 talk on "Understanding the Python GIL", I've been working on a variety of new material--including some graphical visualization of the GIL behavior described in my earlier talk. I'm still experimenting, but check it out.
In these graphs, Python interpreter ticks are shown along the X-axis. The two bars indicate two different threads that are executing. White regions indicate times at which a thread is completely idle. Green regions indicate when a thread holds the GIL and is running. Red regions indicate when a thread has been scheduled by the operating system only to awake and find that the GIL is not available (e.g., the infamous "GIL Battle"). For those who don't want to read, here is the legend again in pictures:
Okay, now let's look at some threads. First, here is the behavior of running two CPU-bound threads on a single CPU system. As you will observe, the threads nicely alternate with each other after long periods of computation.
Now, let's go fire up the code on your fancy new dual-core laptop. Yow! Look at all of that GIL contention. Again, all of those red regions indicate times where the operating system has scheduled a Python thread on one of the cores, but it can't run because the thread on the other core is holding it.
Here's an interesting case that involves an I/O bound thread competing with a CPU-bound thread. In this example, the I/O thread merely echoes UDP packets. Here is the code for that thread.
def thread_1(port): s = socket(AF_INET,SOCK_DGRAM) s.bind(("",port)) while True: msg, addr = s.recvfrom(1024) s.sendto(msg,addr)
The other thread (thread 2) is just mindlessly spinning. This graph shows what happens when you send a UDP message to thread 1.
As you would expect, most of the time is spent running the CPU-bound thread. However, when I/O is received, there is a flurry of activity that takes place in the I/O thread. Let's zoom in on that region and see what's happening.
In this graph, you're seeing how difficult it is for the I/O bound to get the GIL in order to perform its small amount of processing. For instance, approximately 17000 interpreter ticks pass between the arrival of the UDP message and successful return of the s.recvfrom() call (and notice all of the GIL contention). More that 34000 ticks pass between the execution of s.sendto() and looping back to the next s.recvfrom() call. Needless to say, this is not the behavior you usually want for I/O bound processing.
Anyways, that is all for now. Come to my PyCON talk to see much more. Also check out Antoine Pitrou's work on a new GIL.
Note: It is not too late to sign up for my Concurrency Workshop next week (Jan 14-15).
08/01/2009 - 09/01/2009 09/01/2009 - 10/01/2009 10/01/2009 - 11/01/2009 11/01/2009 - 12/01/2009 12/01/2009 - 01/01/2010 01/01/2010 - 02/01/2010 02/01/2010 - 03/01/2010 04/01/2010 - 05/01/2010 05/01/2010 - 06/01/2010 07/01/2010 - 08/01/2010 08/01/2010 - 09/01/2010 09/01/2010 - 10/01/2010 12/01/2010 - 01/01/2011 01/01/2011 - 02/01/2011 02/01/2011 - 03/01/2011 03/01/2011 - 04/01/2011 04/01/2011 - 05/01/2011 05/01/2011 - 06/01/2011 08/01/2011 - 09/01/2011 09/01/2011 - 10/01/2011 12/01/2011 - 01/01/2012 01/01/2012 - 02/01/2012 02/01/2012 - 03/01/2012 03/01/2012 - 04/01/2012 07/01/2012 - 08/01/2012 01/01/2013 - 02/01/2013 03/01/2013 - 04/01/2013 06/01/2014 - 07/01/2014 09/01/2014 - 10/01/2014
Subscribe to Posts [Atom]