Saturday, February 10, 2007

Moving Pyd forward

Pyd, my Python-D interoperability library, has a few outstanding issues. None of them are very appealing to tackle.

While looking at what it would take to support Gregor Richard's Rebuild, I realised that CeleriD is a complete mess. I may end up doing a major refactoring of dcompiler.py before I'm done.

I recently got Pyd's inheritance support up to spec in SVN. While it can now properly handle wrapping a class and its parent (such that both D and Python are happy with the arrangement), the client code needed to bring this about is dreadfully ugly. D's new mixins might be able to provide some relief here, though I haven't really taken a close look at it, yet.

There is some demand for supporting the embedding of Python in D applications using Pyd. The PydObject class is the first step towards supporting this. (That was, in fact, the very first part of Pyd I ever wrote.) However, the class is a sort of half-hearted effort, and I haven't seriously run it through the wringer, yet. The main problem that I have to tackle before supporting embedding is developing a robust solution for building applications that embed Python with Pyd.

Pyd is difficult to build. This has generally not been an issue, as it comes with its own build tool (CeleriD) that knows all about the trickery needed to coerce it into building. If a user knows what they are doing, they can even convince Bud or Rebuild to build it.

First, Pyd and the Python bindings require a handful of version flags to be defined. The first allows Pyd to know the version of Python being used. Since Pyd only supports Python 2.4 and 2.5, the version flag must be one of Python_2_4_Or_Later or Python_2_5_Or_Later. This is currently a stupid naming convention, since the 2_4 flag isn't defined when using 2.5, but it hasn't been an issue with Pyd so far, and client code isn't really expected to use these flags.

The second version flag defines what version of Unicode Python has been compiled to use, UCS2 or UCS4. This flag should be one of Python_Unicode_UCS2 or Python_Unicode_UCS4. I have never tested using the UCS4 version, since all of the versions of Python I have use UCS2. There is no particular reason why it shouldn't work, however.

This brings to light a weakness of Pyd's, which is string support. Although both D and Python are fully Unicode-capable, they are so in different ways. D deals with Unicode strictly in terms of UTF-8, UTF-16, and UTF-32 (with the char[], wchar[], and dchar[] string types, respectively). Python has the unicode type, which may be either UCS2 or UCS4 as previously mentioned, as well as the ability to use any 8-bit encoding, including UTF-8, with its str type. While converting between these encodings isn't particularly difficult, Pyd is currently incredibly stupid in this regard: It only knows how to return a Python str as a char[]. Dumping a UCS2 string into a wchar[] should also be safe, but I haven't implemented it, yet. (And likewise UCS4 and dchar[].)

This reveals another issue of Pyd's. Once upon a time, the char[] that Pyd converted the str to was just a slice over the str's internal buffer. This was incredibly efficient, but also intensely dangerous if the char[]'s lifetime exceeded that of the str. I did not consider this an overly serious issue until I implemented struct wrapping. As soon as I tested a struct with a char[] member, screaming alarms and klaxons went off, as the memory previously used by a string started getting overwritten with garbage.

Not to mention that Python's strings are supposed to be immutable! It would be all too possible for D code to alter the internal buffer of a Python string without meaning to.

So now Pyd .dups all strings returned from Python. Thanks to D's lack of runtime const (which the newsgroup seems to think will change at some point in the future, thankfully), this is the only sane behavior. However, I have gotten an email complaining of Pyd's poor performance with strings. I'm not really sure what to do about it.

Where was I? Oh yes! Version flags. There are only two more that Pyd cares about. The first is Pyd_with_StackThreads, the other is Pyd_with_Tango. The latter is a stupid one, and will be dropped in favor of the standard Tango version flag in future updates. Furthermore, Tango support in Pyd is totally experimental and untested. (Tango requires a bud- or rebuild-like tool to build, and CeleriD doesn't yet support either, so I haven't used Tango with Pyd, yet.) StackThreads is used to wrap D's iteration protocol (opApply). It also doesn't work on Linux. However, it's usually irrelevant when embedding Pyd. (Unless you start talking about extending the embedded interpreter, which should be considered an advanced topic.) In short, you usually won't go wrong ignoring both of these.

(As an aside, Tango includes functionality similar to StackThreads, which even works on Linux. Therefore, Rebuild support in CeleriD equals Tango support in Pyd equals opApply wrapping support on Linux.)

The final step to building Pyd to embed it is to build the Python header and link against the Python runtime. First you need to pick whichever version of Python you're using, and look in the appropriate subdirectory of infrastructure/python. The directory will contain the bindings for that version of the Python/C API (python.d) and a .lib file for the Python DLL, which DMD needs to link to it on Windows. The python.d file is set up with some version(build) and pragma(link) trickery to try this linking step for you, so just make sure, if you're on Windows, that bud/rebuild can see the .lib. (If you're on Linux, just make sure the correct version of Python is installed.)

Hopefully (for the three or so of you out there that know what the hell I'm talking about) that will get Pyd to compile and link.

This probably highlights the need to get Pyd working with DSSS.

2 comments:

Unknown said...

I'm interested in playing with pyd. Where is the code?

Kirk said...

The main pyd.dsource.org website appears to be down (a side-effect of dsource moving to a proper web-host, I bet). For now, you can download RC1 here, or you can checkout the trunk directory from here, if you have svn and feel like living in the bleeding edge.