Thursday, February 22, 2007

Pyd Roadmap

I have finally started using Trac's roadmap feature with Pyd. Here is Pyd's roadmap.

Before RC2, I want to finish the auto-shim-generation feature by automatically adding operator overloads to shim classes. (I already have the code in there for the unary and binary operators, as well as opApply, opCmp, opCall, and maybe others. However, this code is disabled in rev 101.) I also want to get various building issues resolved. The first is adding Rebuild (and maybe Bud) support to CeleriD, which will in turn allow for Tango support. The second is getting Pyd working with DSSS. This should in turn solve the third issue, which is somehow devising a simple way to build applications that embed Python using Pyd.

Speaking of embedding, another pre-RC2 issue is finishing the PydObject class, and adding the PydInterpreter class. These go hand-in-hand with some features I quietly added a while back.

PydInterpreter represents the Python interpreter. It calls Py_Initialize when constructed and Py_Finalize when destructed. It has an Import method, which imports a Python module, and returns the module object as a PydObject. It has a Run method, which basically just calls PyRun_SimpleString and returns its result as a PydObject. Other methods may be added if I think of something useful.

The other embedding-related features involve extending the embedded interpreter. Embedded Python is rarely useful if your Python scripts cannot access functionality provided by the application. Therefore, Pyd must make available the full brunt of its class and function wrapping functionality, even when embedding.

I'm still trying to work out the best API for all of this, which is why I kept it quiet when I added some of this functionality. Basically, when you wrap a function or class in this way, you will also have to provide a module name. This module will exist entirely inside the embedded interpreter, and any Python scripts imported by your application will be able to import this module. This also implies that you could divide your application's functions and classes into multiple Python modules, which could be a useful organizational nicety.

While it would be possible to make functions and classes implicitly available in any imported scripts (so that the script would not have to import any modules to get at your application's functionality), I am split on whether to implement such a feature. It would be fairly un-Pythonic.

Sunday, February 18, 2007

Subverting polymorphism is fun!

Revision 100 of Pyd implements the API changes I discussed previously. There are still some rough edges, however. (These are mostly related to operator overloading. Static member function wrapping is broken, too.) I thought it would be amusing at this point to discuss Pyd's support of polymorphic behavior.

Take the following:

class Base {
    void foo() { writefln("Base.foo"); }
    void bar() { writefln("Base.bar"); }
}

class Derived : Base {
    void foo() { writefln("Derived.foo"); }
}

void polymorphic_call(Base b) {
    b.foo();
}

Let's say that a user writes a subclass of Derived in Python, and overloads the foo method:

class PyDerived(Derived):
    def foo(self):
        print "PyDerived.foo"

If an instance of PyDerived is passed to polymorphic_call, we would (optimally) expect it to print out "PyDerived.foo". The challenges involved in implementing this should become obvious if you start thinking about vtables.

Pyd solves this problem by introducing "shim" child classes. For every class that is exposed to Python, a shim class is generated. Every method that is being exposed to Python is overloaded in this shim, in such a way as to dispatch to Python if it detects that this is an instance of a Python subclass, and call the parent class's method otherwise.

If my explanation was non-sensical, then it is probably best to think of it as a black box. Suffice to say that, in the above example, Base and Derived each get a shim generated for them.

So for every class you want to expose to Python with Pyd, there are suddenly four classes involved:

  1. The original class
  2. The shim class
  3. The Python class directly wrapping the original class
  4. The Python class wrapping the shim

Number 3 might come as a suprise. Although the original class is wrapped, this is only done so that D functions can return instances of the original class directly to Python. Number 4 is the class which may be best thought of as the Python version of your D class.

Update Feb 22: Revision 101 of Pyd changes this. There is now only one Python class wrapping both the original D class and its shim.

Sunday, February 11, 2007

Improving Pyd's Inheritance Support

After being asked to explain what I wanted to see in D as far as new metaprogramming features are concerned by Walter, I thought of a way to vastly improve Pyd's treatment of class wrapping without any new features. The current API looks like this:

wrapped_class!(Foo) f;
f.def!(Foo.bar);
finalize_class(f);

This API design predates D's tuples. Basically: wrapped_class is a struct template with a number of static member functions. The def member function, for instance, creates the wrapper function around a method and appends it to a list. It is important to note that this list is created entirely at run-time. The three lines above are a series of statements. There is no way to use the list of methods wrapped in some other compile-time operation (such as generating the D wrapper classes needed to properly support inheritance).

The new API design I thought of looks like this:

wrap_class!(
    Foo,
    Def!(Foo.bar)
);

In this case, wrap_class is a function template like this:

void wrap_class(C, T ...) ();

C is the class being wrapped, and T is a tuple of various pre-defined struct templates, which replace the static member functions of wrapped_class.

Def, then, is a struct template which should look somewhat familiar:

struct Def(alias fn, char[] name=symbolnameof!(fn), fn_t=typeof(&fn), uint MIN_ARGS=minArgs!(fn))

Not only does the call to wrap_class look cleaner than the old API (for one, it doesn't have to create any spurious local variables), but it makes all of the information about a class available at compile-time. This can be used, together with the new mixins, to automatically generate the stupid wrapper classes used by Pyd to properly handle polymorphism. This would hide them in the guts of Pyd, where they firmly belong.

One issue this brings up is docstring support. Docstrings were previously supported through the (if I may say so) fairly elegant solution of making them regular function arguments. The first solution that comes to mind with this new API is this slightly more arcane one:

struct Docstring {
    char[] name, doc;
}

void docstrings(Docstring[] docs...);

// ... PydMain ...
    docstrings(
        "Foo.bar",  "This function does such-and-such.",
        "SomeFunc", "This other function does something else."
    );
// ...

Gotta love D's typesafe variadics.

Saturday, February 10, 2007

Moving Pyd forward

Pyd, my Python-D interoperability library, has a few outstanding issues. None of them are very appealing to tackle.

While looking at what it would take to support Gregor Richard's Rebuild, I realised that CeleriD is a complete mess. I may end up doing a major refactoring of dcompiler.py before I'm done.

I recently got Pyd's inheritance support up to spec in SVN. While it can now properly handle wrapping a class and its parent (such that both D and Python are happy with the arrangement), the client code needed to bring this about is dreadfully ugly. D's new mixins might be able to provide some relief here, though I haven't really taken a close look at it, yet.

There is some demand for supporting the embedding of Python in D applications using Pyd. The PydObject class is the first step towards supporting this. (That was, in fact, the very first part of Pyd I ever wrote.) However, the class is a sort of half-hearted effort, and I haven't seriously run it through the wringer, yet. The main problem that I have to tackle before supporting embedding is developing a robust solution for building applications that embed Python with Pyd.

Pyd is difficult to build. This has generally not been an issue, as it comes with its own build tool (CeleriD) that knows all about the trickery needed to coerce it into building. If a user knows what they are doing, they can even convince Bud or Rebuild to build it.

First, Pyd and the Python bindings require a handful of version flags to be defined. The first allows Pyd to know the version of Python being used. Since Pyd only supports Python 2.4 and 2.5, the version flag must be one of Python_2_4_Or_Later or Python_2_5_Or_Later. This is currently a stupid naming convention, since the 2_4 flag isn't defined when using 2.5, but it hasn't been an issue with Pyd so far, and client code isn't really expected to use these flags.

The second version flag defines what version of Unicode Python has been compiled to use, UCS2 or UCS4. This flag should be one of Python_Unicode_UCS2 or Python_Unicode_UCS4. I have never tested using the UCS4 version, since all of the versions of Python I have use UCS2. There is no particular reason why it shouldn't work, however.

This brings to light a weakness of Pyd's, which is string support. Although both D and Python are fully Unicode-capable, they are so in different ways. D deals with Unicode strictly in terms of UTF-8, UTF-16, and UTF-32 (with the char[], wchar[], and dchar[] string types, respectively). Python has the unicode type, which may be either UCS2 or UCS4 as previously mentioned, as well as the ability to use any 8-bit encoding, including UTF-8, with its str type. While converting between these encodings isn't particularly difficult, Pyd is currently incredibly stupid in this regard: It only knows how to return a Python str as a char[]. Dumping a UCS2 string into a wchar[] should also be safe, but I haven't implemented it, yet. (And likewise UCS4 and dchar[].)

This reveals another issue of Pyd's. Once upon a time, the char[] that Pyd converted the str to was just a slice over the str's internal buffer. This was incredibly efficient, but also intensely dangerous if the char[]'s lifetime exceeded that of the str. I did not consider this an overly serious issue until I implemented struct wrapping. As soon as I tested a struct with a char[] member, screaming alarms and klaxons went off, as the memory previously used by a string started getting overwritten with garbage.

Not to mention that Python's strings are supposed to be immutable! It would be all too possible for D code to alter the internal buffer of a Python string without meaning to.

So now Pyd .dups all strings returned from Python. Thanks to D's lack of runtime const (which the newsgroup seems to think will change at some point in the future, thankfully), this is the only sane behavior. However, I have gotten an email complaining of Pyd's poor performance with strings. I'm not really sure what to do about it.

Where was I? Oh yes! Version flags. There are only two more that Pyd cares about. The first is Pyd_with_StackThreads, the other is Pyd_with_Tango. The latter is a stupid one, and will be dropped in favor of the standard Tango version flag in future updates. Furthermore, Tango support in Pyd is totally experimental and untested. (Tango requires a bud- or rebuild-like tool to build, and CeleriD doesn't yet support either, so I haven't used Tango with Pyd, yet.) StackThreads is used to wrap D's iteration protocol (opApply). It also doesn't work on Linux. However, it's usually irrelevant when embedding Pyd. (Unless you start talking about extending the embedded interpreter, which should be considered an advanced topic.) In short, you usually won't go wrong ignoring both of these.

(As an aside, Tango includes functionality similar to StackThreads, which even works on Linux. Therefore, Rebuild support in CeleriD equals Tango support in Pyd equals opApply wrapping support on Linux.)

The final step to building Pyd to embed it is to build the Python header and link against the Python runtime. First you need to pick whichever version of Python you're using, and look in the appropriate subdirectory of infrastructure/python. The directory will contain the bindings for that version of the Python/C API (python.d) and a .lib file for the Python DLL, which DMD needs to link to it on Windows. The python.d file is set up with some version(build) and pragma(link) trickery to try this linking step for you, so just make sure, if you're on Windows, that bud/rebuild can see the .lib. (If you're on Linux, just make sure the correct version of Python is installed.)

Hopefully (for the three or so of you out there that know what the hell I'm talking about) that will get Pyd to compile and link.

This probably highlights the need to get Pyd working with DSSS.