My file wrapper and Josiah's code now work in Python 3.1 and the changes have been committed to the Google Code project. The next thing to tackle is converting Mark Hammond's code from C++ to C and integrating it into the _subprocess.c file.
My file wrapper and Josiah's code now work in Python 3.1 and the changes have been committed to the Google Code project. The next thing to tackle is converting Mark Hammond's code from C++ to C and integrating it into the _subprocess.c file.
As I have mentioned, I have spent the last few weeks both in stats books, finding my way around R, and cleaning up and refactoring the code for the generalized linear models in the NiPy models code. I have recently hit a wall in this code, so I am trying to clear out some unposted blog drafts. I intended for this post to introduce the generalized linear models approach to estimation; however the full post will have to wait. For now, I will give an introduction to the theory and then explain where I am with the code.
Generalized linear models was a topic that was completely foreign to me a few weeks ago, but after a little (okay a lot of) reading the approach seems almost natural. I have found the following references useful:
I have been making slow but steady progress on the NiPy models code. Right now for the midterm review, we have been focusing on design issues including the user interface and refactoring, test coverage/bug fixing, and some extensions for postestimation statistics. Other than this, I have spent the last month or so with anywhere from ten to fifteen stats, econometrics, or numerical linear algebra and optimization texts open on my desk.
The main estimators currently included in the code are generalized least squares, ordinary least squares, weighted least squares, autoregressive AR(p), generalized linear models (with several available distribution families and corresponding link functions), robust linear models, general additive models, and mixed effects models. The test coverage is starting to look pretty good, then there is just squashing the few remaining bugs and improving the postestimation statistics.
Some enhancements have also been made to the code. I have started to include some public domain or appropriately copyrighted datasets for testing purposes that could also be useful for examples and tutorials, so that every usage example doesn't have to start with generating your own random data. I have followed pretty closely to the datasets proposal in the Scikits Learn package.
We have also decided to break from the formula framework that is used in NiPy. It was in flux (being changed to take advantage of SymPy the last I heard) and is intended to be somewhat similar to the formula framework in R. In its place for now, I have written some convenience functions to append a constant to a design matrix or to handle categorical variables for estimation. For the moment, a typical model/estimator is used as
For the most of the past week, I have been working on variable separation of PDE's and yesterday I sent my merge request. Currently two mostly-used strategies - additive and multiplicative separation are implemented and tested on simpler equations like Laplace's.
So, what's next? :)
I am currently prototyping a pdesolve() which can handle first-order equations (and also any hyperbolic PDEs) using characteristic methods. Once this is finished, I can start adding the variable separation into pdesolve() and after this has been done, I can move on to implment support for boundary and initial conditions.
Now we can instanciate C++ objects dynamically on Cython.
thins like
cdef Foo *aFoo = new Foo()
del aFoo
are allowed.
We're using Python Syntax for it, so, we define the contructor with "__init__" special method.
Now, the next step is to work with function overloading. We already have the logic to implement it. So, what I need now is to work :).
But, before it, I shall write some tests and some documentation, and my branch of GSoC will be merged to the main branch of Cython, and so, let users do what is needed.
There is as yet no equivalent of R in applied econometrics. Therefore, the econometric community can still decide to go along the Python path.
We are deciding what syntax to use on operators overloading and constructor.
We have C++ and Python syntax, and we have big differences between them.
A thread was created on cython-dev list for voting and commenting the choose of the syntax.
On C++ syntax, the constructor is a method with no return type (it's not void!) and the operators are methods with the keyword "operator" followed by the operator to overload (e. g. "bool operator==(int value);").
On Python, we have the special method __init__ for constructor and other special methods for the operators (__add__, __str__, etc.)
For more information, take a look at the wiki of the project:
http://wiki.cython.org/gsoc09/daniloaf/progress
Now that my implementation of subprocess.Popen is complete and unit tested, I will be moving on to moving my changes to Python 3.1. I just compiled Python 3.1 on Debian, all modules included, and pdb is still mal-functioning so if the code conversion isn't as smooth as I anticipate, I will be using print statements to debug which is a bit difficult due to the nature of subprocess.Popen.
The license of my project has been changed from GPL to Apache License 2.0 so that there is a chance of it being integrated into the Pytohn core. There are still some issues that I may have handled incorrectly as far as my subprocess.Popen file wrapper goes, mainly handling the "mode" argument that is used when one opens a file. Right now, it is mostly ignored, the exception being universal newline support.
A few days ago, I discovered the .get() method of the dictionary data type. It allows you to request the value corresponding to a key which a dictionary may not have, in which case you can specify what should be returned instead of rasing a KeyError. In PyDy, I had numerous cases where I was using if/else statements to check if a dictionary had a certain value, but by using the .get() method, I was able to eliminate about 50 or 60 lines of code and make the code a lot more readable and easier to maintain.
The progress on the trigonometric functions has been slow. Ondrej and I worked on some code together that handles most of the cases in the table by Fu et al., but not all. One of the things that has been a little challenging is to identify pi shifts in the argument to the trig functions and map the shift into the first quadrant (0, pi/2) and return the appropriate result, e.g.: sin(x+17*pi/18) –> sin(x + pi/18), etc. The approach has been to match x and r in the following generic argument:
As I see it, there are three cases to consider: 1) x==0, r!=0, 2) r==0, x!=0, and 3) x!=0, and r!=0. I am splitting the logic up in this fashion and am trying to deal with each case in the appropriate manner. In most cases, r will be of the Rational class, and therefore the modulo operator is needed to bring it to the interval (0, 2*pi), so I implemented the necessary code and sent a patch to the Sympy mailing.
Although the trig functions aren’t quite as they need to be yet, I was still able to manually get the simplifications that are presented by Fu et al. to work using the builtin Sympy commands, which hopefully will mean that simplifying the implementing the algorithm will be feasible with the existing Sympy tools and not too many extra functions will need to be created.
I've finished porting the API to Gnome keyring. The latest code on the repository includes this feature. To compile the lib for gnome-keyring, you need Glibc 2.0, D-Bus 1.0 and Gnome Keyring 1.0. The changes that the keyring api made on your os can be see in the Seahorse or Gnome Keyring Manager( Deprecated).

Next platform is KDE Kwallet, here's some interesting materials .
I've had little progress this week, and couldn't update my blog last week, due to final exams of my final semester. But since the exams are about to end, I think at least I can document my previous to last week's progress.
I analyzed a good number of python C modules using gcov manually. The results seem good and encouraging and gcov seems to be the right tool for the coverage. The fact that it requires modules to be linked in statically into Python is a bit of a trouble, but after having a detail discussion with my supervisor, Seth, we decided on assuming the following two points for my later work on integrating the gcov reports into figleaf:
Last days I’ve been busy preparing the first public beta of SymPy 0.6.5. Most of the time was spent solving a bug that made documentation tests fail under python2.4, but now that this is solved, I hope that by the end of the week we could have a final release.
When this release is published, we’ll merge my query module and work on getting it right for 0.7.
So, week six. I devoted this entire week to exhaustive testing and bug fixing. Its just good software development to take some time, and test and fix bugs. No new features, nothing. I also took the time in the process to refactor a section of code I felt was extremely hacky and brittle. The benefit [...]
There it is — the first 'equation' ever rendered by Mathtex! Although it may look like nothing more than a 99-DPI 12pt x in italicized Computer Modern it is really something quite special — a vision of progress.
Below is the parse-tree and glyph stream generated by the program:
freddie@fluorine ~/Programming/mathtex $ python main.py
[Hlist 9.42> [Hlist 0.00> ] [Hlist 9.42> `x` k1.17] [Hlist 0.00> ]]
[(-0.5, 7.0, Bunch(symbol_name=x, metrics=Bunch(advance=9.41821289062, iceberg=7.0, ymax=7.0, height=7.0, width=8.25, slanted=True, xmax=8.6875, xmin=0.5, ymin=0.0), num=120, fontsize=12, offset=0.0, postscript_name=Cmmi10, font=, glyph=))]
Over the last couple of days I have been working on the code that I committed last week (here for those that are interested) and as promised now have something that does work.
However, there are several unpleasantness associated with it: firstly it depends on mathtex.ft2font — the FreeType wrapper used by matplotlib; secondly there is currently only a Cairo backend; thirdly the only font series supported Computer Modern, by way of the Bakoma fonts; fourthly the font paths are currently hard-coded.
I plan to fix all of these issues over the next couple of days — starting with using font metrics files as opposed to FT2Font and then writing a C-based renderer and wrapping it using Cython. I expect that this will be done by Friday.
Hey everybody. Apologies beforehand for my lack of updates -- I know I'm totally failing at maintaining this blog (as usual, and as expected), but that doesn't mean that progress is not being made.
Far from that, actually. I've finished my exams break last week and resumed my usual schedule; everything is looking very in shape for the first evaluation on July the 6th. Let's take a look at the SVN logs and see what has been done... After the break.
Though it has been a while since I have made a post, things are going well right now. Since my last post, I have created a Google Code Repository located at http://code.google.com/p/subprocdev/. I have integrated, modified and written tests for the code I got from Josiah Carlson. Two of the functions he left outside of his Popen patch that were moved into subprocess.Popen. I made some aesthetic and functional changes to his recv_some and send_all functions.
Eventually, I got my code and Josiah's code just about fully unit tested in Python 2.7 and from there, proceeded to see if I could convert it over to 3.0. After I did this, I realized that it still ran, for the most part, in Python 2.7. With the following code, I attempted to get around the only thing that was causing me grief:
import sys
if sys.version_info[0] == 3:
from io import BufferedWriter as buffer
else:
class str(object):
def __init__(self,a,b=None):
self = a.__str__()
def isinstance(a,b):
if b == str:
return hasattr(a,'strip')
try:
return a == b(a)
except:
return False
In Python3.0, the str function can also accept character encoding as an argument so I tried to overload the Python 2.7 str class to make it 3.0 compliant to no avail so I will be producing a separate branch for Python 2.7 and 3.0 development in my Google Code Repository but any suggestions on getting around the str issue are greatly appreciated.
Due to python3.x 's intentional backwards incompatibility, it is not an easy work to migrate a project's codebase from python2.x to python3.x. Guido has given a recommended development model:
Officially, according to my schedule this week should've been spent on producing a set of unit tests. Now, usually when things don't go to schedule it is because something bad or unexpected occurred.
School finally ended last week (no exam results yet, though) and I took a long-long trip back home.. Week started slowly due to the deadly combination of the forgotten laptop power cord, Midsummer day, Victory Day and best weather ever. This resulted in lots of grilling, beer and also a small side-effect: my GSoC mentor is probably quite angry now for missing my weekly update again...
Back to Sympy now.
So far all my various doc-related fixes have been committed to upstream, and I have decided to bite the bullet and start implementing the variable separation for PDEs. So far some very simple testcases based on first-order equations work, but I have some trouble with equations containing higher order derivatives. You can pull from gihtub/pde-separate and please break it :)
Following on from the Mathtex project announcement last week we now also have a mailing list. mathtex-dev; http://groups.google.com/group/mathtex-dev?lnk= which is open to all. Although a development list anyone with an interest in the project, should make their voices heard. This will almost certainly become more important in the next week or so when the floor is opened to feature/enhancement requests (backends and syntax support).
I’m watching a documentary on the parallels between the market crash of 1929, and the market crash in 2008(last year). I was reminded of the exploits of Adam Zimmerman, of Brian Stableford’s Emortality series, where Adam used his brilliance in the stock markets to cause a massive market crash, while the masters of the business [...]
This blog will keep tracking the development of Interfacing pygr with Clustalw alignment, BLASTZ alignment, BLAT alignment, LAGAN alignment, MLAGAN alignment, and gmap alignment. The Project is part of Google summer of Code 2009, and will be mentored by Dr. Titus Brown and Jenny Qing Qian , to both of whom I am very grateful.
Currently, Pygr is interfaced only with BLAST alignment program.
For the actual interfaces, I will write separate Python modules (by the way, Python is the language I am using). Before mid-term evaluation (which is on July 06, 2009), I plan to finish the first three alignment programs and the remaining three will be finished after the mid-term. So far I have finished interfacing Clustalw and blastz with pygr.
The mainstay of this blog will be progress reports. Comments are welcome.
All too often we see reporters miss the real questions or lob easy ones at the Powers That Be. This isn’t enough to make me paranoid, as generally, people do good. But one particular line in this Time article about missing boy Danny Barter really got me interested. The line in question is this: [...]
last week we begun with the work to support the 'new' and 'delete' operators.
A new node was created (NewStatNode) and a function in the parser to call it (p_new_statement).
I hope everything continue with no problems.
The midterm evaluations is coming. I hope everything be on time until there. :)
I am still working on modifying trigonometric.py to make all of the trig methods behave as they would in Mathematica/Matlab/Maple. This involves getting the .eval method of each of them correct, and then from there I can work on implmenting a new trigsimp.
After some hacking on the queries module, I finally got it right without the limitations of past versions. You can check it out from my repo http://fseoane.net/git/sympy.git, branch master.
It now relies even more on logic.inference.satisfiable(), which is just an implementation of the DPLL algorithm. Bad news is that (my implementation of ) dpll_satisfiable() is SLOW, so inevitably queries are SLOW. But everything is not lost, since the algorithm is quite fast, and in fact other variants of the algorithm (MiniSAT) perform 6600x times faster than my implementation on medium-sized problems (60 variables, 170 clauses). So this looks like something smells bad on the programming side …
However, I spent the day profiling the function (link to source code used for profiling) without much success
The past two weeks I was busy studying for my examinations, so I didn’t have much time to work on my GSOC project. However I was able to do a lot of work to get the PyCameraObject struct integrated in the existing code and implemented some additional functions.
Here is the definition of this struct, it isn’t completed yet because some “types” I would like to put into it cause compile errors.
PyCameraObject:
typedef struct {
PyObject_HEAD
char* device_name; // unieke name of the device
SeqGrabComponent component; // A type used by the Sequence Grabber API
SGChannel channel; // Channel of the Sequence Grabber
GWorldPtr gWorld; // Pointer to the struct that holds the data of the captured image
Rect boundsRect; // bounds of the image frame
} PyCameraObject;
In addition of implementing the PyCameraObject struct, I have writen the folowing functions.
This week I am planning to finish the mac_start_capturing function so that I can capture images from my camera an put then in a SDL_Surface object. As usional you can folow my work on my github account.
This past week I wrote code for int32 and float64 types, giving me an addition ufunc that supports 32/64bit ints and floats. The code for each case is pretty long; each datatype duplicates code that loads the function arguments, checks for cases like reduction and lengths that don't evenly divide the SIMD parallelization/loop unrolling, etc. So I started factoring this out and developing the beginnings of a framework for writing CorePy-based ufuncs with minimal effort and code duplication.
I succeeded in isolating most of the duplicate code, though the 'generic' code generation is a bit more complicated than before -- especially the specialized reduction code. This needs some polish still; I used two different approaches for factoring out the operation-specific code for general element-wise operations and reductions. I will probably use the approach I did for reduction -- the generic code requires a function (I can actually pass a single instruction) that takes two registers as arguments (accumulator and source element) and generates reduction code for a single element.
I ran into a few problems while working on this stuff; I have no idea how I managed to miss these bugs until now. First, applying my ufunc to a single-element array gave bad results -- the test I was using to detect reduction was also catching this single-element case, and not working properly.
The other issue I've run into is due to the way I'm splitting work among multiple threads. I just did the easiest thing possible (halfway expecting to do something more elaborate later) -- divide the length evenly among each thread, giving the last thread additional work if the number of threads does not divide the length evenly. The problem with this is that individual threads will get work segments with unaligned memory if the work segment length is not a multiple of 16 bytes. So this problem is the next thing I need to tackle -- I've been thinking about ways to split the work evenly, while also rounding work segments to a multiple of 16 bytes.
So I pushed some changes to github so Ondrej could help me debug the nseries tests, when I noticed that the changes that I pushed had some bad comments. So I decided to rebase. But git rebase -i told me that there was already a rebase in progress. I figured that I must have done it a long time ago and forgot to abort, so I ran git rebase –abort.
DON’T DO THAT.
I noticed my editor was telling me that an open file had changed. Then, I noticed that ALL of my uncommited changes were gone! And, being uncommited changes, git did not have them saved anywhere!
So now I started to panic. I had done a lot of work on dsolve that I hadn’t commited yet. Normally, I have hourly backups run by Time Machine, but I am on vacation and my backup drive is at home. So I started to see if I could retrieve it somewhere. grep quickly told me that it wasn’t in the hidden git directory, but it was still in my .pyc files. But a Google search told me that retrieving from that is not so easy, if not impossible with Python 2.6. So then, I decided to see if there was any lingering stuff in my virtual memory from my editor. So I ran grep on my harddrive and waited.
While I was waiting, though, I noticed when I scrolled up in my command history that my lost changes were in my Terminal. It turns out that I had just run git commit –interactive and had used * on my patches, so it gave me everything!
So I copied my Terminal history and will work on putting everything back tomorrow. It should be easy, assuming that git apply works for the format that git gives in commit –interactive.
So the lessons are: Don’t abort a rebase without commiting. Don’t start a rebase and then leave it there. Look in your Terminal history if you loose stuff. And it might be a good idea to make manual backups if you are away from your backup drive for a while.
This also highlights why it is important to try to recover data immediately after realizing that it is gone. If I had closed my Terminal session or filled it past the maximum number of lines, my data would be gone. Even if it were in my virtual memory, that wouldn’t last forever either.

At the moment the directory structure of my project built upon Sphinx’s dir structure looks like this:
builders/webapp/ -> webapp builder used by Sphinx to build the docs and the webapp,
builders/webapp/templates/ -> the *.py template files rendered by webapp builder (at the moment one file – server.py – lies here, but actually this file is simply copied, not rendered, but it might change in the future),
builders/webapp/templates/html/ -> the templates of HTML code rendered by the middleware (imported by server.py) when serving the docs,
web/middleware/ -> middleware for Xapian, OpenID and serving html files in general,
themes/basic/web/ -> templates for full HTML files. These files are used as templates twice. First, when Sphinx is building the docs, it extends these files with “layout.html”, and then, for a second time, when the files are accessed through the browser. That’s why these files are separated from those located in builders/webapp/templates/html/ directory, which are rendered only once, by the middleware.
A bit of mess, isn’t it? Comments are appreciated
The structure for generated docs and webapp looks like this (to be found in _build/webapp):
html/ -> files from builders/webapp/templates/html and themes/basic/web directories are located here. We don’t have to separate them anymore, because they are all used as templates by middleware now.
openidstore/ -> OpenID directory. It is created and administered by python-openid library.
public/ -> files built by Sphinx go here. This is the directory which should be served
xapian_db/ -> Xapian’s database.
server.py -> a script which imports the middleware and serves the docs. The idea is that: by adapting the code from this file to one’s framework, one should be able to use this webapp in his python web framework of choice. I plan to change it soon, so all server.py logic will be available as a middleware, because the middleware (Xapian, OpenID, etc.) pieces are actually unseparable, so there is no need for an ordinary user to care about importing them all.
Nevertheless, that’s how it looks like now. It may all change in the future, but if we are about to discuss it, it’s good to describe it as it is first.
