Planet Python/SoC
August 28, 2008
My GSOC project was all about testing for PyGame;
- I wrote lots of tests; Almost every module in PyGame now has at least one test
- Test modules can now be isolated in subprocesses; one segfault no longer brings down the whole test suite
- Can now test for speed regressions; important for real time software such as games
- PyGame Automated Build Page extended
- Shows / Collects more info
- Runs tests in subprocesses
- Test Stubbing Utility: A Testing "Todo List"
- Optional Interactive Tests / Test Tagging
For writing the tests I wrote a small utility that inspects the PyGame package and finds all the untested callables (functions, properties) and creates test stubs, including documentation for each so you don't have to leave the editor. The stubber knows which functions have already been tested by using a naming scheme for all of the tests. Essentially, "test_$callable__$comment", namespaced by having TestCase[s] per Class and a test module per module.
In this way I could create stubs for each module, essentially a TODO list, and cycle through all the modules looking for tests that were easy to write. The functions in PyGame are many and greatly varied, each requiring somewhat specialised knowledge to test. I wasn't able to write tests for all them but hopefully the test stubbing utility will help enable some testing sprints. I intend to develop a testing website where people can submit bugs/tests in the form of a unittest.
PyGame has a somewhat unique set of requirements compared to most python libraries in that most of the framework is actually written in C. C code when it goes awry can do some very strange things. We had a test runner running all of the tests in one single process so if one failed hard it would bring down the whole suite. This can be a bit of a pain so I developed a test runner that isolates each module in a subprocess.
Some of the tests in PyGame have requirements that make them unsuitable for running as part of the main test suite. For example some require a CDRom, a JoyStick, take way too long or need interaction with a human. With the test runner script I extended unittest with the ability to exclude certain tests by tags. The tags can be module, class or individual test level and are inheritable/ over-ridable.
Another extension to the test runner was the ability to randomize the run ordering of tests, so along with the test results the seed is printed out. If there are failures you can seed the randomizer with the failure inducing seed. We also wanted to be able to record the timings of each individual test so we could make comparisons between revisions / platforms. I again extended the test runner with that ability.
I worked with Brian Fisher to extend the PyGame automated build page to record the test results in a ZODB and utilize the new test runner to run tests in subprocesses. We will be able to use this information for detecting speed regressions amongst other things.
August 28, 2008 07:46 PM
August 27, 2008
a.k.a. I F***ING HATE COMPUTERS WHEN THEY DON’T WORK RIGHT AND I CAN’T FIX THEM
Introduction
Up until now, I’ve almost always been running the DrProject server and the web browser to drive it on the same machine. For reason I will not go into, I needed to run DrProject on an Ubuntu Linux installation on one computer and run the web browser on Windows on another computer. These two computers were on the same LAN. Let’s call these computers Lin and Win, respectively. Please don’t mind me personifying these machines.
When Firefox on Lin talked to the DrProject being hosted locally, everything worked fine. When Firefox on Win talked to the DrProject server on Lin, each page request took an excruciatingly long time to complete — something in the order of tens of seconds, rather than a second or two.
Coming up is a chronicle of following numerous false leads, running up and down the stairs, and diving into the depths of computing to debug problems at a low level. It unfolded over the course of about 3 hours, and is told in approximately chronological order.
Sanity checks
First, I tried accessing DrProject using Internet Explorer (on Win) to made sure that it wasn’t Firefox’s fault. Indeed, it wasn’t. Second, I tried accessing the server using a different Windows computer. Still no luck. I even tried using telnet0 to access the HTTP server, where I still received a much-delayed response.
The puzzling thing was that Lin had no delays in accessing Internet resources, nor LAN resources (e.g. SMB into the Windows boxes).
In retrospect, if Windows was the problem, then I wouldn’t have caught it because I didn’t have another Linux computer on hand.1
Networking
Maybe the network was to blame. All the computers in question were hooked up to a home router (”residential gateway”), and maybe it was causing problems. So I hooked up Lin and Win directly (on a single cable), and set up a static IP configuration for each. Even after back-and-forth tweaking and checking, this didn’t work. So I produced my old 10 Mb/s Ethernet hub, and connected these two computers to the hub. After more IP configuration tweaking, I finally got these computers to talk to each other again.
Mind you, all of this was complicated by the fact that Lin and Win were located one floor apart (yay for running up and down the stairs). Also, Lin had 2 network interfaces2, and it was difficult to tell which one was mapped to device eth1 and which was mapped to eth2. I did try enabling and disabling each of these two interfaces.
… And there was no improvement in the Win-browser-talking-to-Lin-server lag.
Enter Wireshark (formerly Ethereal)

By now, I had dragged my friend Roy into the problem, as he is a Linux expert and a competent programmer. Other than mildly disagreeing about my choice of screen capture software, he suggested right off the bat that I analyze the network traffic with Wireshark, a packet sniffer. Since I used the older Ethereal back on Windows, I promptly set off to install Wireshark through the Debian package system without hesitation.
In the meantime, I tested using Win to connect a server socket on Lin to see if that was a problem. Using telnet3 on Win and a netcat listener4 on Lin, I saw no delay at all. That was weird. More and more, it looked like “tracd”, the built-in HTTP server in DrProject (inherited from Trac), was where I should have pointed my blame.
When Wireshark was downloaded and installed, I fired it up and captured the relevant network interface. I changed the DrProject server port from 8080 to the standard 80, and used the capture filter “tcp port http” to reduce the amount of data captured.
What immediately caught my eye was that while most rows were coloured green, quite a number of them were coloured black. Wireshark explained that the black rows represent packets whose TCP checksums were incorrect, giving the reason ‘maybe(sic) caused by “TCP checksum offload”?’.
Digging a bit deeper, these “black” packets were all from Lin to Win (but not all packets from Lin to Win were black), and they all had a size of 1514 bytes. But wait a minute! Isn’t it true that the Ethernet MTU is 1500 bytes? Wouldn’t it be bad to exceed the MTU? So I consulted Wikipedia and RFC 1191, but to no avail. They confirmed that the Ethernet standard MTU is 1500 bytes, but didn’t mention whether that included Ethernet headers or not.
Through Wireshark’s detailed analysis of the raw 1514 bytes of data, I concluded that the 14 bytes were for Ethernet headers — 6 bytes for the MAC address of the sending machine, 6 bytes for the MAC address of the receiving machine, and 2 bytes for the EtherType5. With a bit of ifconfig wizardry6, I set the MTU to 1486 bytes so that after adding the Ethernet headers, the total comes out to 1500 bytes. But I still had no luck with the slow connection problem!
Following the TCP conversations through Wireshark did not prove of much use either. But I did see that it took about 10 seconds to respond to each HTTP connection.
strace
Besides Wireshark, the other major recommendation from Roy was to run the DrProject server under strace. This program invokes another program, and traces (prints) all of the Unix system calls that the child program makes.
Roy asked me to inform him of the last line printed by strace whenever the output stopped momentarily. I first noted that the trace stopped at “wait4(-1,” after starting the server. He said that it was because the target process was launching a child process.
I realized that to make this trace work, I had to disable DrProject’s auto-reloader. Which actually required me to hack the code, because it seemed the “–auto-reload” command line option was not parsed and handled correctly. In addition to that, the Python BaseHTTPServer was multi-threaded, so I had to invoke strace with the -f option, which resulted in significantly more trace output to the screen.
Finally, something interesting appeared. When Win talks to Lin, the trace stops at “read(4,” for a number of seconds, then resumes. Roy said that 4 is the file descriptor, and asked me to trace all the actions starting from the point where that particular FD (#4) was opened. This is what I found, with some irrelevant information removed:
socket(PF_FILE, SOCK_STREAM, 0) = 4
fcntrl(4, F_GETFD)
connect(4, {sa_family=AF_FILE, path="/var/run/avahi-daemon/socket"}, 110) = 0
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(4, "RESOLVE-ADDRESS 192.168.0.108\n", 30) = 30
read(4, <-- Hangs here for a few seconds
If you ask me, that data being passed to the write system call looked like a DNS thing, or perhaps an ARP thing.
Avahi
With the evidence provided by strace in hand, Roy pointed to the Avahi Zeroconf software as the source of the problem. He asked me to toggle it — that is, start it if it was not running, and stop it if it was already running.
Stopping Avahi7 and trying to access DrProject server on Lin from Win’s Firefox again… Success! W00t! 3 hours of intense troubleshooting had finally come to an end.
Conclusion
Was there a point in writing this? I don’t know. I don’t think you, the reader, can learn much from it. This article is comparable to a rant, with a hodge-podge of disconnected personal events and ideas. An honest and objective rant, though.
But I think this paints a picture of what computer problems look like, and how they are exhaustively troubleshooted and (hopefully) finally solved.
Oh yeah, and I took up about 1.5 hours of Roy’s time while he was at work. He has a flexible schedule as a software developer at a big company, so no worries. =P
Footnotes
Lin had the IP address 192.168.0.101.
Win had the IP address 192.168.0.108.
After Lin and Win could talk to each other, I would sit at Lin’s console and operate Lin while also hacking into Win using Remote Desktop Connection. So I didn’t run up and down between computers that much.
0: telnet 192.168.0.101 8000. Type “GET / HTTP/1.1 \r\n Host: 192.168.0.101 \r\n \r\n”.
1: Then again, I could have booted Ubuntu from a live CD.
2: One integrated on the motherboard, and one on a PCI card.
3: telnet 192.168.0.101 8000
4: netcat -l -p 8000
5: In this case, it was 0×0800 for IPv4.
6: ifconfig eth2 mtu 1486
7: sudo /etc/init.d/avahi-daemon stop

August 27, 2008 07:06 PM
I haven't posted anything since Alpha 1 and currently we have Alpha 2 available, Beta 1 is around the corner (has been for quite a while).
Google Summer of Code has ended, most features from my proposal where fulfilled, but I'm still working on it on my free time, altough now with school I don't have as much time as in summer vacations but is free time anyways.
I have updated the screenshots section from the carcode project page with latest work (beta 1):
http://code.google.com/p/carcode/wiki/Screenshots
Currently I'm working on adding a few levels in order to have a Beta with something to work on, more levels will come and ofcourse, you can make suggestions, there is a wikipage on level design:
http://code.google.com/p/carcode/wiki/LevelDesign
Feel free to add and modify!
Last but not so last, I'm Alive! and working on carcode! don't worry :)
August 27, 2008 06:21 PM
August 26, 2008
Via Michael Nygard:
O’Reilly is creating a new line of “community-authored” books. One of them is called “97 Thing Every Software Architect Should Know”… All of the “97 Things” books will be created by wiki, with the best entries being selected from all the wiki contributions.
The whole wiki makes for interesting (if uneven) reading.
August 26, 2008 07:30 PM
The last of our summer students finishes at the end of this week; here’s a few links to close off another great season:
Thanks again, everyone — I really enjoyed working with you all.

(photo courtesy of Qiyu Zhu)
August 26, 2008 07:20 PM
Accumulated while on holiday—funny how sometimes I used this blog as an external strap-on memory pack.
More later, including final wrap-up on Google Summer of Code projects.
August 26, 2008 12:13 PM
August 25, 2008
pyttk 0.2 has been released today. As before, you can pick its source at http://pypi.python.org/pypi/pyttk.
Besides bug fixes and some improvements in its usage, it finally brings a test suite! This is also the first release after GSoC finished.
August 25, 2008 09:17 PM
August 23, 2008
Johnny, Kick a hole right in the sky! Won't some body testify? Poke a lion in it's eye!
I bought pygame-testify.net today, and set up a python/cgi based form that takes a zip and enumerates the results + adds the (safe evaled) test results dict to a ZODB.
I found a multi-part python snippet for POST[ing] of test results.
The test/build page is starting to come together.
I am using htpasswd for security.
August 23, 2008 09:29 AM
August 22, 2008
Pictures say it better than words ever could:

Maddie in the Back Yard

Feeding the Fish with Mummy and Grampa

In the Big Chair

On the Beach

Sadie Looks Good in Hats

The Bride Makes Her Entrance

Love, Honor…

Cutting the Cake

The Sopranos Version

Families Together

Three Mothers

Uh, What Just Happened?
August 22, 2008 10:22 PM
August 21, 2008
That's true: Django works on Jython without any special patch!
For anyone interesting in trying it out, I've written the steps on the Jython wiki.
For Jython, I think this is great. Not only showing that it is alive and well, but also to expose how much progress has been done in almost every front. Better unicode support, an improved parser, setuptools compatibility, performance improvements, experimental system-restarting support, datetime and decimal support on zxJDBC, are some of the features that enables this milestone. Looking back, it is a lot of impressive work of all Jython developers.
For Django this is great too! I think the codebase got improved by some of the changes that were needed to enable Jython support, and this sort of confirms Django as the web framework that every Python platform want to get running (Jython, PyPy, IronPython... not to mention AppEngine). I'm also amazed at the response we got from the Django developers. They managed to include the needed changes in the middle of the rush for the 1.0 release. Amazing.
For me... well, I'm very, very happy. It has been a lot of fun so far, and there will be more of it. Did I mention that I will be speaking at DjangoCon on a few weeks? ;-)
Moreover, there are 4 days left for the GSoC coding period, and I have another objective to meet. Stay tuned...
Update: And here is the other goal completed: Deploying Django projects on Java App Servers.
August 21, 2008 03:19 AM
August 19, 2008
Accoding to the schedule, the coding period of this year SoC came to an end today. A happy ending for most projects, including mine :)
No, no, it wasn't just a happy end -- it was completely awesome. I'm talking about the whole experience, not just the results (which I think are really good too, but this is something that other people must also judge). The exchanging of ideas, experiences, and even small-talk with very smart and enthusiastic people which is a lot of miles away, but also working quite closely with you is very unique. You learn a lot. And not only strictly technical things. Just as an example, my English is now way better than a few months ago.
If you are a student, and like open source, you can't miss the opportunity to write a SoC proposal and go for your project, if you haven't already done so. Really, you can't.
For some reason, is in this exciting moment when you look behind and start to think about how rich this experience was, when you also say thanks. It just feels natural, so here I go.
Thanks to Google, who made this possible. Thanks to the Google Open Source Staff, the people who made this possible.
Thanks to Jim Baker, my SoC mentor. Not only strong on the technical aspect, also on the motivational one. Not to mention that he is always full of ideas. If you are a student on a future GSoC and Jim says that he wants to mentor your project, just do it. You won't look back. [Note that this means that you are likely to work with Jython. You may want to start looking at it early ;-) ]
Thanks to the Jython and Django developers, for accepting me and my patches :-). No, seriously. Jython developers gave me a lot of trust when I was accepted as a committer, and that speed up the process considerably. And Django folks even while busy fixing/implementing/reworking hundred of things in the rush for the 1.0 release, were able to keep Jython (and alternative Python VMs in general) among its priorities for the this release.
Thanks to Imagemaker, my current employer, who were very kind and let me work part-time for all the SoC coding period. It's still a small company, but it is growing quite fast. I hope it remains cool while continuing growing. If you are a programmer working on Chile, you may want to look at it. Hey!, if you are a student and want to work here and also participate in a GSoC, now there is a also precedent ;-). Talented people is certainly welcomed here.
Last, but not (in any way!) least, thanks to my family. I doubt they are going to read this, but I have to admit that I've been very absent on the last months. Thanks for their understanding. And for being there in those moments when I wasn't absent :).
August 19, 2008 12:19 AM
August 18, 2008
Now that Django runs on Jython out of the box, you may wonder: "How do I deploy the resulting project into an application server (Tomcat, GlassFish, JBoss, etc)"?
The answer:
~/myproject$ jython25 manage.py war \
--include-java-libs="/path/to/my/jdbc-driver.jar"
Providing that you have the latest svn revision on
django-jython, and you have included
'doj' into your project
INSTALLED_APPS, it will just work.
Then you drop the WAR file (generated in the parent directory of your project) into your application server, and that's all. No need to install Django or Jython in the target server. The WAR is completely self-contained.
If you are interested on Django/Jython, give this a try! And for more information, see the
WarDeployment page on the django-jython wiki.
August 18, 2008 06:45 PM
So this is it. Google Summer of Code 2008 is coming to an end. This last week I am wrapping up everything. If somebody is interested, I have written a final report which wraps up these last three months. Some extracts follow below. For everybody else, here is a short version:
- implemented sys.getsizeof
- pushed muppy to the first release
- Me very happy
- Many thanks to all the supporters
Now the full report ..
Deliverables
This project delivers two pieces of work: the sys.getsizeof() implementation, as well as muppy, a memory leak tool set for Python.
sys.getsizeof
Lets start with what the documentation says:
sys.getsizeof(object[, default])¶
Return the size of an object in bytes. The object can be any type of
object. All built-in objects will return correct results, but this does not
have to hold true for third-party extensions as it is implementation
specific.
The default argument allows to define a value which will be returned if the
object type does not provide means to retrieve the size and would cause a
TypeError.
func:getsizeof calls the object’s __sizeof__ method and adds an additional
garbage collector overhead if the object is managed by the garbage collector.
sys.getsizeof has a default implementation, which is used if the type of the object that is passed to sys.getsizeof does not have its own implementation. Some built-in types (e.g. dict) have their own implementation which incorporates special implementation details of each type.
An important decision was to only include the size of the memory which was required by the object itself, not any referenced objects. This gives a clear guideline on what should be included in an object’s size. For example, unicode objects cache a string representation of themselves. Should this object be included? No, because it is a new object merely referenced by the unicode
object.
Also sys.getsizeof only guarantees to works for objects of built-in types and types which adhere to the conventions. If a third-party extension provides a new C-implemented type which, besides the size defined in basicsize and itemsize allocates other memory, this will not be reported by sys.getsizeof(). Such extensions will need to implement their own sizeof function. Usually though, this should not be necessary.
The code created for sys.getsizeof is now integrated into the CPython code base
muppy
Although muppy started as a consolidation of existing memory profiler approaches
it quickly turned into an memory leak detection toolset.
To be useful as leak finder, basic operations must be supported. These are
- retrieve all existing objects
- filter objects by type and size
- do diffs on object sets
- get referents of objects up to a certain level
Because it is often not useful to work with entire object sets, but sufficient to work with summaries of those, a summary module is provided. It allows to view existing objects grouped by type, number, and size. The following features are provided:
- summarize a set of objects
- print summaries as tables
- do diffs on summaries.
Especially the last feature is useful if you want to monitor the memory usage over time. To further ease this tracking, the tracker module can be used. It allows to
- retrieve differences between a time t1 and a time t2
- print those diffs
Users could implement this them-self, but tracker instances consider previously stored summaries and deduct them from the returned result. If a summary is too coarse-grained, it is also possible to use the ObjectTracker which returns object instances that were created since the last invocation.
Last but not least, muppy can identify where objects are referenced. This is useful when objects are leaking, which is often the case when objects are unintentionally still referenced somewhere in the application. The refbrowser module provides reference browsing for the console, output into a file, and interactive browsing though a graphical user interface.
When available, muppy uses the sys.getsizeof function to retrieve an object’s size. If this is not the case, the asizeof module from Jean Brouwers is used. This provides backward compatibility of muppy for Python versions prior to 2.6.
Muppy is now hosted on the Python cheese shop and Google code. The cheese shop has the documentation as well as the package download and Google code provides the development infrastructure.
Memory leak in Tkinter
With the help of muppy I was able to identify a memory leak in Tkinter . I was asked to check IDLE for any memory leaks. In this process, I discovered that memory was indeed leaking whenever a new window was opened and closed again. The reason was an implementation issue in Tkinter handling of Menu entries which now is fixed.
Time line
I started working on the sys.getsizeof function in May, with a first proposal posted on bugs.python.org on May 17th7. After discussions, the first patch was applied on June 1st8. The initial patch included special implementations of getsizeof for dict, list, byte, and long objects. Later on, unicode, tuple, set, byte array, and frame object implementations were added. Some tests failed on Windows 64-bit systems due to the special 64-bit model used in this architecture. This turned out to be helpful, as it pointed to errors in the test implementation which were not noticed on other architectures. The getsizeof implementation was correct for the most part, but needed an additional change to deal with type polymorphism and old style classes. The last patch regarding sys.getsizeof was committed on July 14th.
About at the same time I started working on muppy (see above). At first basic functionality was implemented, then the summary as well as the tracker module. A week later I started analyzing the IDLE application. With the tracker I could see that some objects are leaking every time a window was opened and closed, but I was not able to identify the referrers. Thus, the refbrowser modules were implemented. Now I could trace the leaking objects back to the Tkinter module. By the beginning of August a patch was proposed and checked-in a week later.
Last Words
This project was a great experience for me and I would like to thank all involved participants.
First of all, I would like to thank Martin von Loewis, who has been a great mentor, was always there to answer my questions, invisibly guided my first steps in the Python community and lead me through the the depths of CPython.
Next, I would like to thank everybody from the Python developer community who discussed issues with me, provided the necessary insight and pointed to the resources which helped to get the job done.
On the organizational side, many thanks to Leslie Hawthorn from Google and James Tauber from the Python Software Foundation for organizing these three months and making it work so smoothly.
Last but not least special thanks to Jean Brouwers, who’s implementation of the asizeof script inspired my work and who shared his thoughts with me throughout my project and beyond.
Finally an incomplete and unordered list of things I have started to understand and make use of during the last three months: CPython code base, garbage collection in Python, ReST, distutils, IDLE, Tkinter, googlecode hosting, Python’s cheeseshop, lots on the decision process in Python, serious bug tracking and fixing, unicode transformation format, implemting object orientation in a procedural programming language, 64-bit programming models, memory alignment, breaking backwards compatability computer language (with all implications on the user side).
August 18, 2008 07:46 AM
I am trying to get the final screencast properly edited and dubbed, and will be putting the 0.01 package online soon. It has been quite the day for me, as Murphy's Law would have it, everything decided to go more then slightly haywire when I decided to record video :-\
August 18, 2008 01:12 AM
August 16, 2008
The alpha version of my project is out. If you're interesting, please try it and give feedback to me. I'd like to attach the overview of my GSOC project to make you know more about it. If you want to try it, you can download the code from the svn site svn://seul.org/svn/pygame/branches/physics , there're 4 test case in PYD folder now, enjoying it.
The overview of my GSOC project
My project is a 2D physics module for Pygame(like box2d and chipmunk). It's written in C and it uses basic Python # structs and functions, which means this module can be integrated with Pygame seamlessly.
My initial goals are :
* World support with body and joint management, gravity and damping
* Body support with mass, shape, velocity, friction, torque,
rotation value, linear and rotational damping
* Joint support including distance joints
* Shape support with AABB and OBB collision detection, rectangular, polygonal and circle shapes
There are serveral ways to make the project successful:
* I set my goal at the beginning and made a schedule and a roadmap to guide my work.
* I've been contacting my mentor and discussing with him regularly to report my current work and the problems I met, so I've got so many helps from him and I can solve problems quickly.
* Learn form other physics libraries and some papers for physics simulation algorithms.
Current status of this project:
* World support is fully implemented
* Body support is fully implemented
* Joint support is fully implemented including distance joints(tested) and recently added revolute joints(still on working)
* Shape support is implemented with collision detection and rectangular shape support
* Interfaces of the Python module.
* C API extensions.
* Most parts of the module are stable and work as supposed. It lacks large optimizations, especially for the collision detection, but those are
areas being worked on later.
Finally, Thank my mentor very much, he's so conscientious and gives me so many helps and guidances.
August 16, 2008 08:09 AM
August 14, 2008
This issue/comments somehow escaped from my noticed, initially. I have
addressed your comments in the new set of patches.
1) Previous patch Docs had issues. Updated the Docs patch.
2) Included message in cgi.py about parse_qs, parse_qsl being present
for backward compatiblity.
3) The reason, py26 version of patch has quote function from urllib is
to avoid circular reference. urllib import urlparse for urljoin method.
So only way for us use quote is to have that portion of code in the
patch as well.
Please have a look the patches.
As this request has been present from a long time ( 2002-08-26 !), is it
possible to include this change in b3?
Thanks,
Senthil
Added file: http://bugs.python.org/file11116/issue600362-py26-v2.diff
August 14, 2008 02:29 PM
>>> import urllib2
>>> url = 'http://www.whompbox.com/headertest.php'
>>> request = urllib2.Request(url)
>>> request.add_data("Spam")
>>> f = urllib2.urlopen(url)
>>> request.header_items()
[]
>>> request.unredirected_hdrs.items()
[]
>>> f = urllib2.urlopen(request)
>>> request.header_items()
[('Content-length', '4'), ('Content-type', 'application/x-www-form-urlencoded'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
>>> request.unredirected_hdrs.items()
[('Content-length', '4'), ('Content-type', 'application/x-www-form-urlencoded'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
Comment: This is fine. What is actually happening is do_request_ method in the http_open() is setting the unredirected_hdrs to above items.
>>> request.add_header('Content-type','application/xml')
>>> f = urllib2.urlopen(request)
>>> request.header_items()
[('Content-length', '4'), ('Content-type', 'application/xml'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
Comment: When we add_header() the headers are indeed changed. Correct behavior.
>>> request.unredirected_hdrs.items()
[('Content-length', '4'), ('Content-type', 'application/x-www-form-urlencoded'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
Comment: add_header() has not modified the unredirected_hdr.
Is this the whole purpose of issue2756? If yes, then better understanding of unredirected_hdr is needed and in the do_request_ method of AbstractHTTPHandler, where it changes unredirected_hdrs based on the logic of "not request.has_header(...)" what is actually aimed for checking that.
If add_header() is not supposed to change unredirected_hdrs but, add_unredirected_header() is the call to change unredirected_hdrs then, it is working fine and as expected.
(This is an undocumented interface, items() call was used for viewing the headers, tough actual code might not be using it.
>>>request.add_unredirected_header('Content-type','application/xml')
>>> request.unredirected_hdrs.items()
[('Content-length', '4'), ('Content-type', 'application/xml'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
>>>
Comment: add_unredirected_header() has correctly affected. After application of the attached patch in issue report which modifies the add_header() and add_unredirected_header() method to remove the existing header of the same name. We will observe that the unredirected_hdr itself is removed and it is never added back.
After application of attached patch:
>>> url = 'http://www.whompbox.com/headertest.php'
>>> request = urllib2.Request(url)
>>> request.add_data("Spam")
>>> f = urllib2.urlopen(request)
>>> request.header_items()
[('Content-length', '4'), ('Content-type', 'application/x-www-form-urlencoded'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
>>> request.unredirected_hdrs.items()
[('Content-length', '4'), ('Content-type', 'application/x-www-form-urlencoded'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
>>> request.add_header('Content-type','application/xml')
>>> f = urllib2.urlopen(request)
>>> request.header_items()
[('Content-length', '4'), ('Content-type', 'application/xml'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
>>> request.unredirected_hdrs.items()
[('Content-length', '4'), ('Host', 'www.whompbox.com'), ('User-agent', 'Python-urllib/2.6')]
>>>
Comment: Notice the absense of Content-type header.
August 14, 2008 01:18 AM
August 13, 2008
John’s summary of our discussion about what to teach scientists about reproducible research if they already believe it’s a good thing, and want to start doing it reminded me that I never posted about the Provenance Challenge. It has been run twice so far; each time, authors of tools to track the provenance (or lineage) of scientific data have to implemented some workflows, then answers questions about where data came from, what was done to it, and so on. The results of the first challenge are described system-by-system in these papers (sorry, but it’s behind a wall — if you google for combinations of the authors’ names, you can find PDF preprints). This is a very cool research area, and I hope one of my incoming grad students will want to do something with it.
August 13, 2008 02:48 PM
Got the combo-boxes working the way I want them to. I realized that convenience-handling for each and every Widget type is not something that is [1] perhaps not entirely necessary and [2] certainly not entirely necessary right this instant. With that in mind, I'll be wrapping up/documenting the code for an initial 0.1 release on Friday.
I'll be working on another screencast that will demonstrate integration with buildbot, automated launching of sugarbot and Sugar, as well as the new Python scripting abilities. Overall, I'm very pleased with where I am right now.
It's been a lot of fun developing for GSoC/the Python Org/One Laptop Per Child, and I'm sad to see the summer come to an end (less free time to work on it) in the next few weeks. Hopefully I'll be able to keep my motivation up to keep development going.
Once I have the GSoC-final screencast up, I plan on publishing it to all of the mailing lists. Thanks to Grig and Titus for providing me with guidance and advice throughout the summer, you guys are really great. Also, thanks to those of you on the mailing lists and IRC channels that also provided tips, insight, and advice that made those obscure API problems so much simpler.
Zach
August 13, 2008 12:23 PM
The issue noticable here:
[ors@goofy ~]$ python
Python 2.6b2+ (trunk:65482M, Aug 4 2008, 14:26:01)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> url = 'http://hroch486.icpf.cas.cz/formpost.html'
>>> import urllib2
>>> req_obj = urllib2.Request(url)
>>> req_obj.unredirected_hdrs
{}
>>> req_obj.add_data("Spam")
>>> req_obj.unredirected_hdrs
{}
>>> response = urllib2.urlopen(req_obj)
>>> req_obj.unredirected_hdrs
{'Content-length': '4', 'Content-type': 'application/x-www-form-urlencoded',
'Host': 'hroch486.icpf.cas.cz', 'User-agent': 'Python-urllib/2.6'}
>>> req_obj.add_data("SpamBar")
>>> req_obj.add_header("Content-type","application/html")
>>> response = urllib2.urlopen(req_obj)
>>> req_obj.unredirected_hdrs
{'Content-length': '4', 'Content-type': 'application/x-www-form-urlencoded',
'Host': 'hroch486.icpf.cas.cz', 'User-agent': 'Python-urllib/2.6'}
>>> req_obj.get_data()
'SpamBar'
>>>
In the final req_obj call, the unredirected_hdrs had not changed in neither
Content-length nor in Content-type.
August 13, 2008 11:29 AM
I just simply can’t beleive it! Edwin! and that too right before my own eyes………………
All of us, (Anil,Eddy, Sreeraj and me) were at Trivandrum Central Station having arrived by the Guruvayoor-Trivandrum express at four in the morning after a tiresome journey with sparing amounts of sleep.
Declaring that sleep was what he needed more than coffee bhai (as we fondly liked to call Eddy) rested his head on Anil’s shoulder while Srj made some fleeting joke on how we didn’t have any sleep for the last few days. Bhai stretched out his arms in an attempt to rejuvenate his tired body, when all of a sudden he just dropped on to the concrete platform floor banging his head (which we didn’t realize at the time). Knocked out with fatigue, is what we figured……..
On taking him to the Cosmopolitan Hospital, the doctor’s conformed our worst fears and declared his condition to be critical. He was vomiting blood and was in a semi-drowsy, semi-conscious state. The next day his severe internal hemorrhage and skull fracture was conformed.
He is now in the ICCU of the Cosmopolitan Hospital, Trivandrum. After two torturous days the doctors have finally started saying that his case is hopeful and though it may be a bit slow, a recovery should happen.
Pray for my friend. He was the best of us.
P.S. I thank the whole Swathanthra Malayalam Computation Team for their support and especially Anivar Aravind and Anoop John who came to our aid at the hour of our greatest need.
I also thank Lord Almighty for safe-gaurding Bhai from deeper danger.

August 13, 2008 08:26 AM
August 12, 2008
I finally got a summary of graduate students feedback on the consulting course I ran this past winter. It was pretty good overall—on a scale of 1-5, the responses were:
| How much background is required to successfully complete this course? |
None |
Lots |
2.6 |
| How easy was it to obtain details/background needed to supplement the lecture material? |
Easy |
Hard |
2.7 |
| Did the term work increase your understanding of the material? |
Not at all |
Very much |
4.3 |
| The material was presented: |
Too slowly |
Too fast |
3.0 |
| The material was: |
Too broad |
Too specialized |
2.8 |
| Was the workload: |
Too light |
Too heavy |
3.5 |
| How well organized or prepared was the lecturer? |
Not at all |
Very |
4.0 |
| How satisfied were you with the lecturer? |
Not at all |
Very |
4.7 |
| Overall rating of the course |
Bad |
Great |
4.5 |
| What resources did you use heavily for the course? |
|
Lectures |
2 |
|
|
Text |
0 |
|
|
Papers |
4 |
|
|
Reference Books |
0 |
|
|
Course Notes |
0 |
|
|
Friends |
2 |
|
|
Lecturer |
2 |
|
|
TA |
0 |
|
|
Internet |
10 |
Advice to people who are considering the course in the future:
- If there’s a project you’re interested in or something you want to learn, this is a great way to spend time doing it and getting a course credit at the same time.
- Great course. Find a good project and give ‘er.
- This course is what you make of it. You can mold the course in order to get out more of what you are interested in. I found this freedom great!
- Lots of projects to choose from.
- Good if you want some public speaking experience.
- This course is very good for undergraduate students. It will enhance their coding skills and give them a good opportunity to find a job (by linking them to people in industry). If you are a graduate student and your thesis involves building an application, it’s a very good chance to pass a course as you are doing so.
- Great course. You will learn a lot, but make sure you make realistic estimates of how long everything will take, otherwise it will be too much work.
General Comments — Good
- Provides great insight into large project development.
- Lecturer was outstanding!
- Interesting discussions.
- You can do many different things (develop applications) in this course.
- Learn a lot; things you won’t learn in any other course.
- Hands-on learning and real-world experience.
General Comments — Bad
- Very code intensive, not very appropriate for grad students if the project is irrelevant to their research.
I’m now looking for project ideas for students in the fall — if you’re interested, please drop me a line.
August 12, 2008 06:49 PM
The summer is coming to an end, so students are posting screencasts:
Previously posted:
It’s been another great summer—I’m proud to have worked with them all.
August 12, 2008 02:36 PM
I played around with pdb module today to debug this issue. pdb is really
helpful.
Here's how the control goes.
1) There is an url with two '//'s in the path.
2) The call is data = urllib2.urlopen(url).read()
3) urlopen calls the build_opener. build_opener builds the opener using (tuple)
of handlers.
4) opener is an instance of OpenerDirector() and has default HTTPHandler and
HTTPSHandler.
5) When the Request call is made and the request has 'http' protocol, then
http_request method is called.
6) HTTPHandler has http_request method which is
AbstractHTTPHandler.do_request_
Now, for this issue we get to the do_request_ method and see that
7) host is set in the do_request_ method in the get_host() call.
8) request.get_selector() is the call which is causing this particular issue
of "urllib2 getting confused with path containing //".
.get_selector() method returns self.__r_host.
Now, when proxy is set using set_proxy(), self.__r_host is self.__original (
The original complete url itself), so the get_selector() call is returns the
sel_url properly and we can get teh host from the splithost() call on teh
sel_url.
When proxy is not set, and the url contains '//' in the path segment, then
.get_host() (step 7) call would have seperated the self.host and self.__r_host
(it pointing to the rest of the url) and .get_selector() simply returns this
(self.__r_host, rest of the url expect host. Thus causing call to fail.
9) Before the fix, request.add_unredirected_header('Host', sel_host or host)
had the escape mechanism set for proper urls wherein with sel_host is not set
and the host is used. Unfortunately, that failed when this bug caused sel_host
to be set to self.__r_host and Host in the headers was being setup wrongly (
rest of the url).
The patch which was attached appropriately fixed the issue. I modified and
included for py3k.
>
> I could reproduce this issue on trunk and p3k branch. The patch attached
> by Adrianna Pinska "appropriately" fixes this issue. I agree with the
> logic. Attaching the patch for py3k with the same fix.
>
> Thanks,
> Senthil
>
> Added file: http://bugs.python.org/file11103/issue2776-py3k.diff
>
August 12, 2008 10:25 AM
August 11, 2008
Hello again,
Today I've marked the project as complete, even though I haven't touched its code since last wednesday (probably). The remaining goal was related to IDLE, for which I had a big patch that was sent to its maillist last monday and got no complaints (and no other feedback either) yet, and I've been using it here with all these changes without apparent problems. So, as I see, this GSoC project can be marked as complete now.
I hope to write more news here in the future, although it won't be weekly news, and those will probably be related to updates in the ttk wrapper (changes caused by Tk 8.6, etc). Also, I promise you the news will have better titles/subtitles than the current ones.
Finally, thanks everyone :)
August 11, 2008 09:39 PM
From the previous post of bugs round up, finished activities (bugs which are closed now) include:
* http://bugs.python.org/issue1432 - Strange Behaviour of urlparse.urljoin
* http://bugs.python.org/issue2275 - urllib2 header capitalization.
* http://bugs.python.org/issue2916 - urlgrabber.grabber calls setdefaulttimeout
* http://bugs.python.org/issue2195 - urlparse() does not handle URLs with port numbers properly. - Duplicate issue.
* http://bugs.python.org/issue2829 - Copy cgi.parse_qs() to urllib.parse - Duplicate of issue600362.
* http://bugs.python.org/issue2885 - Create the urllib package. (but the tests are still named test_urllib2, test_urlparse etc. I shall discuss with py-dev if names changes in tests is okay before beta3 and give a attempt).
Activities TODO:
High- Priority:
Now, the list of bugs which are partially completed and requires addressing of some issues mentioned in the patches.These would take up some higher priority, as addressing would result in closure sooner.
* http://bugs.python.org/issue600362 - relocate cgi.parse_qs() into urlparse
* http://bugs.python.org/issue2776 - urllib2.urlopen() gets confused with path with // in it
* http://bugs.python.org/issue2756 - urllib2 add_header fails with existing unredirected_header Patch attached.
* http://bugs.python.org/issue2464 - urllib2 can't handle http://www.wikispaces.com
Plan:
I shall attempt to address all these issues before the release of Beta3 (That should be either on August 15 / August 23)
The following were some of the main issues to be taken up during G-SOC.
I see that as I have understood RFC 3986 better, I can work on issue1591035. I shall work on it on the branch and then discuss for inclusion in the trunk.
Feature Requests:
* http://bugs.python.org/issue1591035 - update urlparse to RFC 3986.Plan: By August 23.
* http://bugs.python.org/issue1462525 - URI parsing library - This will depend upon the previous issue, so we can assume Aug 23 for closure.
* http://bugs.python.org/issue2987 - RFC2732 support for urlparse (e.g. http://[::1]:80/) This is related bug again and will conclude by the same time-line.
I shall take up the following listed bugs after completion of the above.
* http://bugs.python.org/issue1448934 - urllib2+https+proxy not working.
* http://bugs.python.org/issue1424152 - urllib/urllib2: HTTPS over (Squid) Proxy fails
* http://bugs.python.org/issue1675455 - Use getaddrinfo() in urllib2.py for IPv6 support. Patch provided.
Low priority.
* http://bugs.python.org/issue1285086 - urllib.quote is too slow
August 11, 2008 06:59 PM
There is a good amount of discussion going around with
http://bugs.python.org/issue3300, I had been following from the start and had
an inclination towards quote and quote_plus to support UTF-8. But as the
discussion went further, without strong point on which stance to take, I had to
refresh and improve my knowledge of unicode support in Python and espcially
Unicode Strings in Python 3.0. Hopefully this will come handy in other issues.
Here are some notes on Unicode and Python.
What is Unicode?
In Computing, Unicode is an Industry Standard allowing Computers to
consistently display and manipulate text expressed in most of the world's
writing systems.
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
What is Unicode Character Set?
What is character encoding?
What is Encoding?
Converting a Character (or Something) to Number, because Computer internally
store numbers only.
Unicode Strings are a set of Code Points represented from 0x000000 to 0x10FFFF.
This sequence needs to be represented as a set of bytes ( meaning, values from
0-255) in memory. The rules for translating the Unicode String into sequence of
bytes is called encoding.
The representation in the number format is required for homogenuity, otherwise
it will be difficult to convert to and from.
What is Unicode Transformation Format?
What is UTF-8?
Unicode can be implented using a many character encodings. The most commonly
used one is utf-8, which uses 1 byte for all ASCII characters, which have the
same code values as in the standard ASCII encoding, and up to 4 bytes for other
characters.
When it \u the remaining the Unicode Code points which you will find defined
internationally from unicode.org
Now, how to represent them in BINARY (Coz: Computer!), is the trick and you
will have different encodings to do so.
So UTF-8 is one encoding and UTF-16,ASCII are all different encodings.
So you construct a unicode string
mystr = u'\u0065\u0066\u0067\u0068'
mystr is a unicode object. It does not make sense to print it.
But if you wish to see the object, use repr
print repr(mystr)
Now, the unicode object can be coverted to Binary using encoding, and let us
use 'ascii' and 'utf-8'
so you would do
asciistr = mystr.encode('ascii')
utf8str = mystr.encode('utf-8')
Now, it is string object in BINARY
let us print asciistr, and utf8str
STILL NEED MORE UNDERSTANDING.
http://boodebr.org/main/python/all-about-python-and-unicode
A Unicode string holds characters from the Unicode character set.
August 11, 2008 06:23 PM