Planet Python/SoC - 2009 edition

July 02, 2009

James Pruitt

Core Conversion Complete

My file wrapper and Josiah's code now work in Python 3.1 and the changes have been committed to the Google Code project. The next thing to tackle is converting Mark Hammond's code from C++ to C and integrating it into the _subprocess.c file.

July 02, 2009 04:58 PM

Skipper Seabold

Generalized Linear Models

As I have mentioned, I have spent the last few weeks both in stats books, finding my way around R, and cleaning up and refactoring the code for the generalized linear models in the NiPy models code. I have recently hit a wall in this code, so I am trying to clear out some unposted blog drafts. I intended for this post to introduce the generalized linear models approach to estimation; however the full post will have to wait. For now, I will give an introduction to the theory and then explain where I am with the code.

Generalized linear models was a topic that was completely foreign to me a few weeks ago, but after a little (okay a lot of) reading the approach seems almost natural. I have found the following references useful:


The basic point of the generalized linear model is to extend the approach taken in classical linear regression to models that have more complex outcomes but ultimately share the linearity property. In this respect, GLM subsumes classical linear regression, probit and logit analysis, loglinear and multinomial response models, and some models that deal with survival data to name a few.

In my experience, I have found that econometrics is taught in a compartmentalized manner. This makes sense to a certain extent, as different estimators are tailored to particular problems and data. GLM on the other hand allows the use of a common a technique for obtaining parameter estimates so that it can be studied as a single technique rather than as a collection of distinct approaches.

If interested in my ramblings, you can find a draft of my notes as an introduction to GLM here, as Blogger does not support LaTeX... Please note that this is a preliminary and incomplete draft (corrections and clarifications are very welcome). One thing it could definitely use is some clarification by example. However, as I noted, I have run into a bit of a wall trying to extend the binomial family to accept a vector of proportional data, and this is my intended example to walk through the theory and algorithm, so... a subsequent post will have to lay this out once I've got it sorted myself.

Generally speaking, there are two basic algorithms for GLM estimation: one is a maximum likelihood optimization based on Newton's method the other is commonly refered to as iteratively (re)weighted least squares (IRLS or IWLS). Our implementation now only covers IRLS. As will be shown, the algorithm itself is pretty simple. It boils down to regressing the transformed (and updated) outcome variable on the untransformed design matrix weighted by the variance of the transformed observations. This is done until we have convergence of the deviance function (twice the log-likelihood ratio of the current and previous estimates). The problem that I am running into with updating the binomial family to accept proportional data (ie., a vector of pairs (successes, total trials) instead of a vector of 1s and 0s for success or failure) is more mathematical than computational. I have either calculated the variance (and therefore the weights) incorrectly, or I am updating the outcome variable incorrectly. Of course, there's always the remote possibility that my data is not well behaved, but I don't think this is the case here.

More to come...

July 02, 2009 01:31 PM

Project Status

I have been making slow but steady progress on the NiPy models code. Right now for the midterm review, we have been focusing on design issues including the user interface and refactoring, test coverage/bug fixing, and some extensions for postestimation statistics. Other than this, I have spent the last month or so with anywhere from ten to fifteen stats, econometrics, or numerical linear algebra and optimization texts open on my desk.

The main estimators currently included in the code are generalized least squares, ordinary least squares, weighted least squares, autoregressive AR(p), generalized linear models (with several available distribution families and corresponding link functions), robust linear models, general additive models, and mixed effects models. The test coverage is starting to look pretty good, then there is just squashing the few remaining bugs and improving the postestimation statistics.

Some enhancements have also been made to the code. I have started to include some public domain or appropriately copyrighted datasets for testing purposes that could also be useful for examples and tutorials, so that every usage example doesn't have to start with generating your own random data. I have followed pretty closely to the datasets proposal in the Scikits Learn package.

We have also decided to break from the formula framework that is used in NiPy. It was in flux (being changed to take advantage of SymPy the last I heard) and is intended to be somewhat similar to the formula framework in R. In its place for now, I have written some convenience functions to append a constant to a design matrix or to handle categorical variables for estimation. For the moment, a typical model/estimator is used as


In [1]: from models.regression import OLS

In [2]: from models.datasets.longley.data import load

In [3]: from models.functions import add_constant

In [4]: data = load()

In [5]: data.exog = add_constant(data.exog)

In [6]: model = OLS(data.endog, data.exog)

In [7]: results = model.fit()

In [8]: results.params
Out[8]:
array([ 1.50618723e+01, -3.58191793e-02, -2.02022980e+00,
-1.03322687e+00, -5.11041057e-02, 1.82915146e+03,
-3.48225863e+06])


Barring any unforeseen difficulties, the models code should be available as a standalone package shortly after the midterm evaluation rapidly approaching in ten days. The second half of the summer will then be focused on optimizing the code, finalizing design issues, extending the models, and writing good documentation and tutorials so that the code can be included in SciPy!

July 02, 2009 11:45 AM

Priit Laes

Basic PDE separation of variables implemented

For the most of the past week, I have been working on variable separation of PDE's and yesterday I sent my merge request. Currently two mostly-used strategies - additive and multiplicative separation are implemented and tested on simpler equations like Laplace's.

So, what's next? :)

I am currently prototyping a pdesolve() which can handle first-order equations (and also any hyperbolic PDEs) using characteristic methods. Once this is finished, I can start adding the variable separation into pdesolve() and after this has been done, I can move on to implment support for boundary and initial conditions.

July 02, 2009 03:53 AM

Danilo Freitas

new and del operators working!

Now we can instanciate C++ objects dynamically on Cython.
thins like
cdef Foo *aFoo = new Foo()
del aFoo
are allowed.
We're using Python Syntax for it, so, we define the contructor with "__init__" special method.
Now, the next step is to work with function overloading. We already have the logic to implement it. So, what I need now is to work :).
But, before it, I shall write some tests and some documentation, and my branch of GSoC will be merged to the main branch of Cython, and so, let users do what is needed.

July 02, 2009 01:07 AM

July 01, 2009

Skipper Seabold

Econometrics with Python

There is as yet no equivalent of R in applied econometrics. Therefore, the econometric community can still decide to go along the Python path.


That is Drs. Christine Choirat and Raffello Seri writing in the April issue of the Journal of Applied Econometrics. They have been kind enough to provide me with an ungated copy of their review, "Econometrics with Python." Mentioning the, quite frankly, redundant general programming functions and tools that had to be implemented for R, the authors make a nice case for Python as the programming language of choice for applied econometrics. The article provides a quick overview of some of the advantages of using Python and its many built-in libraries, extensions, and tools, gives some speed comparisons, and also mentions a few of the many tools out there in Python community for econometrics including RPy (RPy2 is now available), and of course NumPy and SciPy. Having spent the last week or more trying to master the basic syntax and usage of R, I very much sympathize with this position. The one complaint I hear most often from my fellow students is that Python is not an industry standard. I hope this can change and is changing, because it's much more of a pleasure to work with Python than the alternatives and that makes for increased productivity.

July 01, 2009 10:03 PM

Danilo Freitas

Some questions about syntax

We are deciding what syntax to use on operators overloading and constructor.
We have C++ and Python syntax, and we have big differences between them.
A thread was created on cython-dev list for voting and commenting the choose of the syntax.
On C++ syntax, the constructor is a method with no return type (it's not void!) and the operators are methods with the keyword "operator" followed by the operator to overload (e. g. "bool operator==(int value);").
On Python, we have the special method __init__ for constructor and other special methods for the operators (__add__, __str__, etc.)

For more information, take a look at the wiki of the project:
http://wiki.cython.org/gsoc09/daniloaf/progress

July 01, 2009 01:34 PM

June 30, 2009

James Pruitt

Not Far From The Tree

Now that my implementation of subprocess.Popen is complete and unit tested, I will be moving on to moving my changes to Python 3.1. I just compiled Python 3.1 on Debian, all modules included, and pdb is still mal-functioning so if the code conversion isn't as smooth as I anticipate, I will be using print statements to debug which is a bit difficult due to the nature of subprocess.Popen.

The license of my project has been changed from GPL to Apache License 2.0 so that there is a chance of it being integrated into the Pytohn core. There are still some issues that I may have handled incorrectly as far as my subprocess.Popen file wrapper goes, mainly handling the "mode" argument that is used when one opens a file. Right now, it is mostly ignored, the exception being universal newline support.

June 30, 2009 06:30 PM

Dale Peterson

Code cleanup and updating of examples

A few days ago, I discovered the .get() method of the dictionary data type. It allows you to request the value corresponding to a key which a dictionary may not have, in which case you can specify what should be returned instead of rasing a KeyError. In PyDy, I had numerous cases where I was using if/else statements to check if a dictionary had a certain value, but by using the .get() method, I was able to eliminate about 50 or 60 lines of code and make the code a lot more readable and easier to maintain.

The progress on the trigonometric functions has been slow. Ondrej and I worked on some code together that handles most of the cases in the table by Fu et al., but not all. One of the things that has been a little challenging is to identify pi shifts in the argument to the trig functions and map the shift into the first quadrant (0, pi/2) and return the appropriate result, e.g.: sin(x+17*pi/18) –> sin(x + pi/18), etc. The approach has been to match x and r in the following generic argument:

x + r*pi

As I see it, there are three cases to consider: 1) x==0, r!=0, 2) r==0, x!=0, and 3) x!=0, and r!=0. I am splitting the logic up in this fashion and am trying to deal with each case in the appropriate manner. In most cases, r will be of the Rational class, and therefore the modulo operator is needed to bring it to the interval (0, 2*pi), so I implemented the necessary code and sent a patch to the Sympy mailing.

Although the trig functions aren’t quite as they need to be yet, I was still able to manually get the simplifications that are presented by Fu et al. to work using the builtin Sympy commands, which hopefully will mean that simplifying the implementing the algorithm will be feasible with the existing Sympy tools and not too many extra functions will need to be created.

June 30, 2009 02:35 PM

Kang Zhang

Porting to Gnome Keyring

I've finished porting the API to Gnome keyring. The latest code on the repository includes this feature. To compile the lib for gnome-keyring, you need Glibc 2.0, D-Bus 1.0 and Gnome Keyring 1.0. The changes that the keyring api made on your os can be see in the Seahorse or Gnome Keyring Manager( Deprecated).




Next platform is KDE Kwallet, here's some interesting materials .

June 30, 2009 01:21 PM

M. Shuaib Khan

Progress...

I've had little progress this week, and couldn't update my blog last week, due to final exams of my final semester. But since the exams are about to end, I think at least I can document my previous to last week's progress.

I analyzed a good number of python C modules using gcov manually. The results seem good and encouraging and gcov seems to be the right tool for the coverage. The fact that it requires modules to be linked in statically into Python is a bit of a trouble, but after having a detail discussion with my supervisor, Seth, we decided on assuming the following two points for my later work on integrating the gcov reports into figleaf:

  1. The user has already compiled Python with modules statically linked in
  2. Proper arguments were used with gcc during the compilation process so to let gcov track the coverage.
Lots of manual work during the past weeks. It is actually time to start coding all of it, and I've to start first with letting figleaf take care of the C code coverage report generation too. I've been looking through the figleaf code to figure out how to best achieve that goal, and will document any changes made here on the blog.

In the meantime, I've also started to look into improving the test coverage by writing new test cases. I found the unittest module to be interesting to start with as it has around half coverage, and would be interesting writing unit tests for the unittest module itself.

June 30, 2009 01:57 PM

Fabian Pedregosa

Preparing a new release

Last days I’ve been busy preparing the first public beta of SymPy 0.6.5. Most of the time was spent solving a bug that made documentation tests fail under python2.4, but now that this is solved, I hope that by the end of the week we could have a final release.

When this release is published, we’ll merge my query module and work on getting it right for 0.7.

June 30, 2009 06:57 AM

Tyler Laing

Movie Module Progress: Week 6

So, week six. I devoted this entire week to exhaustive testing and bug fixing. Its just good software development to take some time, and test and fix bugs. No new features, nothing. I also took the time in the process to refactor a section of code I felt was extremely hacky and brittle. The benefit [...]

June 30, 2009 01:58 AM

June 29, 2009

Freddie Witherden

x Marks the Spot

There it is — the first 'equation' ever rendered by Mathtex! Although it may look like nothing more than a 99-DPI 12pt x in italicized Computer Modern it is really something quite special — a vision of progress.

Below is the parse-tree and glyph stream generated by the program:
freddie@fluorine ~/Programming/mathtex $ python main.py
[Hlist 9.42> [Hlist 0.00> ] [Hlist 9.42> `x` k1.17] [Hlist 0.00> ]]
[(-0.5, 7.0, Bunch(symbol_name=x, metrics=Bunch(advance=9.41821289062, iceberg=7.0, ymax=7.0, height=7.0, width=8.25, slanted=True, xmax=8.6875, xmin=0.5, ymin=0.0), num=120, fontsize=12, offset=0.0, postscript_name=Cmmi10, font=, glyph=))]


Over the last couple of days I have been working on the code that I committed last week (here for those that are interested) and as promised now have something that does work.

However, there are several unpleasantness associated with it: firstly it depends on mathtex.ft2font — the FreeType wrapper used by matplotlib; secondly there is currently only a Cairo backend; thirdly the only font series supported Computer Modern, by way of the Bakoma fonts; fourthly the font paths are currently hard-coded.

I plan to fix all of these issues over the next couple of days — starting with using font metrics files as opposed to FT2Font and then writing a C-based renderer and wrapping it using Cython. I expect that this will be done by Friday.

June 29, 2009 07:56 PM

June 27, 2009

Vicent Marti

Gentlemen. An update.

Hey everybody. Apologies beforehand for my lack of updates -- I know I'm totally failing at maintaining this blog (as usual, and as expected), but that doesn't mean that progress is not being made.

Far from that, actually. I've finished my exams break last week and resumed my usual schedule; everything is looking very in shape for the first evaluation on July the 6th. Let's take a look at the SVN logs and see what has been done... After the break.



::READ MORE

June 27, 2009 08:05 PM

James Pruitt

Updates

Though it has been a while since I have made a post, things are going well right now. Since my last post, I have created a Google Code Repository located at http://code.google.com/p/subprocdev/. I have integrated, modified and written tests for the code I got from Josiah Carlson. Two of the functions he left outside of his Popen patch that were moved into subprocess.Popen. I made some aesthetic and functional changes to his recv_some and send_all functions.

Eventually, I got my code and Josiah's code just about fully unit tested in Python 2.7 and from there, proceeded to see if I could convert it over to 3.0. After I did this, I realized that it still ran, for the most part, in Python 2.7. With the following code, I attempted to get around the only thing that was causing me grief:

import sys
if sys.version_info[0] == 3:
from io import BufferedWriter as buffer
else:
class str(object):
def __init__(self,a,b=None):
self = a.__str__()
def isinstance(a,b):
if b == str:
return hasattr(a,'strip')
try:
return a == b(a)
except:
return False

In Python3.0, the str function can also accept character encoding as an argument so I tried to overload the Python 2.7 str class to make it 3.0 compliant to no avail so I will be producing a separate branch for Python 2.7 and 3.0 development in my Google Code Repository but any suggestions on getting around the str issue are greatly appreciated.

June 27, 2009 02:52 PM

Yang Yang

Porting py.test from python2.x to python3.1 (updated on Jun 27)

Due to python3.x 's intentional backwards incompatibility, it is not an easy work to migrate a project's codebase from python2.x to python3.x. Guido has given a recommended development model:

  1. You should have excellent unit tests with close to full coverage.
  2. Port your project to Python2.6.
  3. Turn on the Py3k warnings mode.
  4. Test and edit until no warnings remain.
  5. Use the 2to3 tool to convert this source code to 3.0 syntax.
  6. Test the converted source code under 3.0.
  7. If problems are found, make corrections to the 2.6 version of the source code and go back to step 4.
  8. When it's time to release, release separate 2.6 and 3.0 tarballs.
Fortunately, py.test is already 2.4 and 2.6 compatible, and has sufficient unit tests. So, I can directly start from step 3. However, unfortunately, Holger requires the codebase ported to 3.1 should still be compatible with 2.4 and 2.6, i.e., maintain only one codebase which could be used through all main python versions. So, 2to3 tool can only be used to show the minimum places that need notification.

Thus, to make py.test compatible with python2.4, 2.6, and 3.1, the first thing is to write some wrapper functions. The functions I wrote are:
Print:
--use Print function instead of each print statement in py.test, so python3.1 will not throw SyntaxError. In Print function, execute correct print code according to current python version.
Raise:
--the only incompatible syntax is raise cls, value, tb (2.x) and raise cls(value).with_traceback(tb) (3.x). So, just simply call corresponding statement in Raise function.
isinstancemethod, isclassmethod, isfunction:
--suppose we defined a class "myclass" and a method "method" inside "myclass", then create an instance of myclass, myinstance. In 2.x, myclass.method and myinstance.method have the same attribute names. But in 3.x, myclass.method's attribute names are the same with the ones of normal function. This is a very annoying difference, because in many places code objects are got by obj.im_func.func_code. In 2.x, obj could be myinstance.method or myclass.method; In 3.x, obj.im_func.func_code must be changed to obj.__func__.__code__ and it only works for myinstance.method. For myclass.method or normal function, should use obj.__code__. So, I have to provide three functions to distinguish them respectively.
updatemethodattr, updatefunctionattr:
--method's attributes:
im_self ==> __self__
im_func ==> __func__
im_class ==> disappeared?
--function's attributes:
func_closure ==> __closure__
func_code ==> __code__
func_defaults ==> __defaults__
func_dict ==> __dict__
func_doc ==> __doc__
func_globals ==> __globals__
func_name ==> __name__
So, to keep codebase unchanged, if obj is 3.x's method or function, create 2.x's attributes.
CmpToKey:
--in 3.x, there is no cmp keyword in sort function. So, this wrapper will transform a cmp function to key function.
bytestostr, strtobytes:
--in 3.x, all strings are unicode, but lots of streams require bytes object instead of string. So, use these two functions to wrap stream arguments and when it's 3.x, transfer from bytes to str or str to bytes.

Second, for some well known incompatibilities, such as "except as" and module rename. For "except Error as e:", change it to "except Error:", and in except block, create e by "e = sys.exc_info()[1]". For module rename, add a try/except block. For example:
try: import StringIO
except ImportError: import io as StringIO

Third, 3.x has many incompatible mechanisms. For example: 3.x will not call __cmp__ when compareing two objects. So, __lt__, __gt__, __le__, __ge__, and __eq__ should be implemented. Furthermore, if you define __eq__, __hash__ must be provided.
In 2.x, dict.items() will return a copy list of key-value pairs. But in 3.x, dict.items() will return a view object, and if the dict's size changed during iteration, a RuntimeError will be thrown. So, although "for key,value in dict.items():" still works in 3.x, it should be changed to "for key, value in list(dict.items()):" in case of RuntimeError.

There are still lots of incompatibities need to be fixed. Currently the packages of py.test without test failures under 3.1 are:
builtin/
cmdline/
io/
log/
path/
process/
rest/
I will keep updating this post when porting. Hope this could be finished ASAP.

June 27, 2009 12:10 PM

Freddie Witherden

This Week in GSoC Mathtex

Officially, according to my schedule this week should've been spent on producing a set of unit tests. Now, usually when things don't go to schedule it is because something bad or unexpected occurred.


However, while looking through the Mathtex code last week something good — but unexpected — occurred. It seems as if splitting Mathtex from Matplotlib is significantly easier than I first anticipated. Therefore, this week has been spent splitting the behemoth mathtex.py file in Matplotlib into several smaller files, ready for externalisation.

I expect that by Sunday or Monday the SVN repository will have a version of mathtex that is able to render equations using a Cairo backend and the FT2Font library from Matplotlib. Once this is working it shouldn't be to difficult to a) add a bitmap backend using FreeType/Cython/libpng; b) use font metrics files as opposed to FT2Font for metrics information.

On a personal note yesterday was also my last day in university accommodation. As of 22:30 BST I am now home again as opposed to being in central London. Yay for packing and unpacking!

June 27, 2009 10:49 AM

June 26, 2009

Priit Laes

Separation of variables for PDEs

School finally ended last week (no exam results yet, though) and I took a long-long trip back home.. Week started slowly due to the deadly combination of the forgotten laptop power cord, Midsummer day, Victory Day and best weather ever. This resulted in lots of grilling, beer and also a small side-effect: my GSoC mentor is probably quite angry now for missing my weekly update again...

Back to Sympy now.

So far all my various doc-related fixes have been committed to upstream, and I have decided to bite the bullet and start implementing the variable separation for PDEs. So far some very simple testcases based on first-order equations work, but I have some trouble with equations containing higher order derivatives. You can pull from gihtub/pde-separate and please break it :)

June 26, 2009 12:01 PM

June 25, 2009

Freddie Witherden

We Have a Mailing List

Following on from the Mathtex project announcement last week we now also have a mailing list. mathtex-dev; http://groups.google.com/group/mathtex-dev?lnk= which is open to all. Although a development list anyone with an interest in the project, should make their voices heard. This will almost certainly become more important in the next week or so when the floor is opened to feature/enhancement requests (backends and syntax support).

June 25, 2009 11:57 AM

Tyler Laing

Science Fiction predicts the future?

I’m watching a documentary on the parallels between the market crash of 1929, and the market crash in 2008(last year). I was reminded of the exploits of Adam Zimmerman, of Brian Stableford’s Emortality series, where Adam used his brilliance in the stock markets to cause a massive market crash, while the masters of the business [...]

June 25, 2009 02:33 AM

June 23, 2009

Eden Elos

Status update

This blog will keep tracking the development of Interfacing pygr with Clustalw alignment, BLASTZ alignment, BLAT alignment, LAGAN alignment, MLAGAN alignment, and gmap alignment. The Project is part of Google summer of Code 2009, and will be mentored by Dr. Titus Brown and Jenny Qing Qian , to both of whom I am very grateful.
Currently, Pygr is interfaced only with BLAST alignment program.


For the actual interfaces, I will write separate Python modules (by the way, Python is the language I am using). Before mid-term evaluation (which is on July 06, 2009), I plan to finish the first three alignment programs and the remaining three will be finished after the mid-term. So far I have finished interfacing Clustalw and blastz with pygr.
The mainstay of this blog will be progress reports. Comments are welcome.

June 23, 2009 10:47 PM

Tyler Laing

Intelligence Gathering Issues

All too often we see reporters miss the real questions or lob easy ones at the Powers That Be. This isn’t enough to make me paranoid, as generally, people do good. But one particular line in this Time article about missing boy Danny Barter really got me interested. The line in question is this: [...]

June 23, 2009 06:09 PM

Danilo Freitas

new and delete operators

last week we begun with the work to support the 'new' and 'delete' operators.
A new node was created (NewStatNode) and a function in the parser to call it (p_new_statement).
I hope everything continue with no problems.

The midterm evaluations is coming. I hope everything be on time until there. :)

June 23, 2009 02:59 PM

June 22, 2009

Dale Peterson

Still working on trigonometric.py

I am still working on modifying trigonometric.py to make all of the trig methods behave as they would in Mathematica/Matlab/Maple. This involves getting the .eval method of each of them correct, and then from there I can work on implmenting a new trigsimp.

June 22, 2009 11:30 PM

Fabian Pedregosa

Queries and performance

After some hacking on the queries module, I finally got it right without the limitations of past versions. You can check it out from my repo http://fseoane.net/git/sympy.git, branch master.

It now relies even more on logic.inference.satisfiable(), which is just an implementation of the DPLL algorithm. Bad news is that (my implementation of ) dpll_satisfiable() is SLOW, so inevitably queries are SLOW. But everything is not lost, since the algorithm is quite fast, and in fact other variants of the algorithm (MiniSAT) perform 6600x times faster than my implementation on medium-sized problems (60 variables, 170 clauses). So this looks like something smells bad on the programming side …

However, I spent the day profiling the function (link to source code used for profiling) without much success

June 22, 2009 10:02 PM

Werner Laurensse

GSOC Week 3-4

The past two weeks I was busy studying for my examinations, so I didn’t have much time to work on my GSOC project. However I was able to do a lot of work to get the PyCameraObject struct integrated in the existing code and implemented some additional functions.

Here is the definition of this struct, it isn’t completed yet because some “types” I would like to put into it cause compile errors.

PyCameraObject:

typedef struct {
    PyObject_HEAD
    char* device_name;              // unieke name of the device
    SeqGrabComponent component;     // A type used by the Sequence Grabber API
    SGChannel channel;              // Channel of the Sequence Grabber
    GWorldPtr gWorld;               // Pointer to the struct that holds the data of the captured image
    Rect boundsRect;                // bounds of the image frame
} PyCameraObject;

In addition of implementing the PyCameraObject struct, I have writen the folowing functions.

Schedule for this week

This week I am planning to finish the mac_start_capturing function so that I can capture images from my camera an put then in a SDL_Surface object. As usional you can folow my work on my github account.

June 22, 2009 03:19 PM

Andrew Friedley

This past week I wrote code for int32 and float64 types, giving me an addition ufunc that supports 32/64bit ints and floats. The code for each case is pretty long; each datatype duplicates code that loads the function arguments, checks for cases like reduction and lengths that don't evenly divide the SIMD parallelization/loop unrolling, etc. So I started factoring this out and developing the beginnings of a framework for writing CorePy-based ufuncs with minimal effort and code duplication.

I succeeded in isolating most of the duplicate code, though the 'generic' code generation is a bit more complicated than before -- especially the specialized reduction code. This needs some polish still; I used two different approaches for factoring out the operation-specific code for general element-wise operations and reductions. I will probably use the approach I did for reduction -- the generic code requires a function (I can actually pass a single instruction) that takes two registers as arguments (accumulator and source element) and generates reduction code for a single element.

I ran into a few problems while working on this stuff; I have no idea how I managed to miss these bugs until now. First, applying my ufunc to a single-element array gave bad results -- the test I was using to detect reduction was also catching this single-element case, and not working properly.

The other issue I've run into is due to the way I'm splitting work among multiple threads. I just did the easiest thing possible (halfway expecting to do something more elaborate later) -- divide the length evenly among each thread, giving the last thread additional work if the number of threads does not divide the length evenly. The problem with this is that individual threads will get work segments with unaligned memory if the work segment length is not a multiple of 16 bytes. So this problem is the next thing I need to tackle -- I've been thinking about ways to split the work evenly, while also rounding work segments to a multiple of 16 bytes.

June 22, 2009 02:58 PM

Aaron Meurer

asmeurer


So I pushed some changes to github so Ondrej could help me debug the nseries tests, when I noticed that the changes that I pushed had some bad comments. So I decided to rebase. But git rebase -i told me that there was already a rebase in progress. I figured that I must have done it a long time ago and forgot to abort, so I ran git rebase –abort.

DON’T DO THAT.

I noticed my editor was telling me that an open file had changed. Then, I noticed that ALL of my uncommited changes were gone! And, being uncommited changes, git did not have them saved anywhere!

So now I started to panic. I had done a lot of work on dsolve that I hadn’t commited yet. Normally, I have hourly backups run by Time Machine, but I am on vacation and my backup drive is at home. So I started to see if I could retrieve it somewhere. grep quickly told me that it wasn’t in the hidden git directory, but it was still in my .pyc files. But a Google search told me that retrieving from that is not so easy, if not impossible with Python 2.6. So then, I decided to see if there was any lingering stuff in my virtual memory from my editor. So I ran grep on my harddrive and waited.

While I was waiting, though, I noticed when I scrolled up in my command history that my lost changes were in my Terminal. It turns out that I had just run git commit –interactive and had used * on my patches, so it gave me everything!

So I copied my Terminal history and will work on putting everything back tomorrow. It should be easy, assuming that git apply works for the format that git gives in commit –interactive.

So the lessons are: Don’t abort a rebase without commiting. Don’t start a rebase and then leave it there. Look in your Terminal history if you loose stuff. And it might be a good idea to make manual backups if you are away from your backup drive for a while.

This also highlights why it is important to try to recover data immediately after realizing that it is gone. If I had closed my Terminal session or filled it past the maximum number of lines, my data would be gone. Even if it were in my virtual memory, that wouldn’t last forever either.

June 22, 2009 05:16 AM

June 21, 2009

Wojciech Walczak

gminick


At the moment the directory structure of my project built upon Sphinx’s dir structure looks like this:

builders/webapp/ -> webapp builder used by Sphinx to build the docs and the webapp,
builders/webapp/templates/ -> the *.py template files rendered by webapp builder (at the moment one file – server.py – lies here, but actually this file is simply copied, not rendered, but it might change in the future),
builders/webapp/templates/html/ -> the templates of HTML code rendered by the middleware (imported by server.py) when serving the docs,

web/middleware/ -> middleware for Xapian, OpenID and serving html files in general,

themes/basic/web/ -> templates for full HTML files. These files are used as templates twice. First, when Sphinx is building the docs, it extends these files with “layout.html”, and then, for a second time, when the files are accessed through the browser. That’s why these files are separated from those located in builders/webapp/templates/html/ directory, which are rendered only once, by the middleware.

A bit of mess, isn’t it? Comments are appreciated :)

The structure for generated docs and webapp looks like this (to be found in _build/webapp):

html/ -> files from builders/webapp/templates/html and themes/basic/web directories are located here. We don’t have to separate them anymore, because they are all used as templates by middleware now.

openidstore/ -> OpenID directory. It is created and administered by python-openid library.

public/ -> files built by Sphinx go here. This is the directory which should be served

xapian_db/ -> Xapian’s database.

server.py -> a script which imports the middleware and serves the docs. The idea is that: by adapting the code from this file to one’s framework, one should be able to use this webapp in his python web framework of choice. I plan to change it soon, so all server.py logic will be available as a middleware, because the middleware (Xapian, OpenID, etc.) pieces are actually unseparable, so there is no need for an ordinary user to care about importing them all.
Nevertheless, that’s how it looks like now. It may all change in the future, but if we are about to discuss it, it’s good to describe it as it is first.

June 21, 2009 12:54 PM