Saturday, December 29, 2007

PySoy for Windows

Recently, I've been helping the PySoy project get things working on Windows. Today I finally made some visible progress. (Click for unscaled image.)

PySoy on Windows

Sunday, August 26, 2007

D Conference 2007 Aftermath

That was amazing! Thanks to Brad and Amazon for hosting!

I had a moment of abject terror during my presentation, when I finished running through my slides 15 minutes into my hour-long block of time. Thankfully, people started asking questions, and the remaining 45 minutes went by in what seemed like no time at all.

My "slides" may be found here. The video of my presentation (and the others!) will be online eventually.

I was asked a couple of times for details about Pyd's design, and its history. Watch this space for more than you ever wanted to know on this topic.

More coverage of the conference can of course be found on Planet D.

Friday, July 27, 2007

Inadequacies of __traits

It is good that the __traits syntax is extensible, since it is missing a few important things. Some of these did not occur to me until I actually started writing code with it.

Take this class:

class Foo {
    int i;
    this() { i = 10; }
    this(int i) { this.i = i; }
    void foo() { writefln("Foo.i is %s", this.i); }
    real foo(double) { return 2.0; }
    static int bar(int j) { return j*2; }
    static real bar() { return 5.0; }
}
__traits(allMembers, Foo) gives us this:

[i,_ctor,foo,print,toString,toHash,opCmp,opEquals]

One of these in particular jumps out, which is "_ctor". This obviously has something to do with the constructor. If the class doesn't define a constructor, then it is not present. If the class defines a destructor, then a "_dtor" member will be present. It turns out that "_ctor" is only very narrowly useful.

All of the following give false:

writefln(__traits(isVirtualFunction, __traits(getMember, Foo, "_ctor")));
writefln(__traits(isAbstractFunction, __traits(getMember, Foo, "_ctor")));
writefln(__traits(isFinalFunction, __traits(getMember, Foo, "_ctor")));

The same results are given if we just say Foo._ctor. This can only mean _ctor is a static function... except it isn't:

auto f = new Foo; // okay
auto g = Foo._ctor(50); // Error: need 'this' to access member this
auto h = f._ctor(100); // okay

What the heck? What is _ctor supposed to be, anyway? Interestingly, by calling f._ctor(100), it modifies f, and f.i becomes 100; f and h refer to the same object.

Next, the allMembers trait doesn't include static members of the class. I suggest either adding an isStaticMember trait and including static members in allMembers and derivedMembers; or adding a pair of allStaticMembers and derivedStaticMembers traits.

However, my primary complaint with traits is that you can only determine the signatures of the overloads of virtual functions. This most notably means you cannot determine the overloads of a global function, and being unable to do it for static and final functions is unfortunate as well. Additionally, you cannot determine the signatures of the class's constructor (which I might expect the _ctor member to be used for).

Here are some more minor issues:

  • There is no isStaticFunction trait. Though it is not strictly necessary, since you can determine it by process of elimination using the other isFooFunction traits, it would be convenient.
  • There is no isDynamicArray trait. This isn't much of an issue, since you can determine whether something is a dynamic array type using regular templates, but it would be consistent with the isStaticArray and isAssociativeArray traits.

Sunday, July 22, 2007

__traits and Pyd

D 2.003 adds compile-time reflection, very much in the manner of what my previous post was asking for. These new features are all included under the umbrella of the new __traits keyword, which the grammar refers to as the TraitsExpression.

I've re-written Pyd about three times. The first time was when I wrote my own pseudo-tuple library (now defunct). The second time was when I replaced that with Tomasz Stachowiak's (aka h3r3tic) tuple library (which later evolved into std.bind). The third time was when D received proper language support for tuples. This last rewrite was characterized by a vast reduction in code.

Subsequent updates to Pyd have focused on its class-wrapping features, particularly the various insanities I've introduced to handle polymorphic behavior properly. Despite my best efforts, there are still some rather serious flaws in it:

  • Inherited methods are not automatically placed in the generated shim class. This means that, to get the full power of Pyd's polymorphism wrapping, you need to explicitly wrap all of the methods of a class, including the inherited ones. I don't believe this is documented, either, but, then again, it has never come up.
  • There are some serious issues with symbol identifier lengths, which I am still grappling with.

With __traits, all of this simply goes away. Wrapping a class, any class, could easily be made to look like this:

wrap_class!(Foo);

And it would all Just Work. I hope to have it done before the conference.

Naturally, this means future versions of Pyd will probably require D 2.003 or newer. I do regret having to do this. It means that, in addition to getting all this new support for __traits in there, I will have to get Pyd and the Python/C API bindings compliant with the new const semantics. This could very well end up being harder, but I won't know until I try.

Thursday, July 5, 2007

Pyd revision 113

I've committed a new update to Pyd. It does three things of note:

First, it fixes a horrible bug in how Pyd keeps class references. Whenever a class instance is returned to Python, Pyd keeps a reference to that instance in an AA. In fact, Pyd has a number of these AAs: One dedicated to class instances, and one more for each other type which may be passed to Python. (For instance, delegates, which may hold a reference to GC-controlled data.)

There is an internal Pyd function for replacing what D object a wrapping Python object points to. This function also takes care of tracking this reference keeping, adding and removing references when needed. It is a template function, called void WrapPyObject_SetObj(T)(PyObject* self, T t).

This function is called whenever you instantiate a Python type which wraps a D class, and when that Python type is deallocated. In the deallocation function, it was called like this:

WrapPyObject_SetObj(self, null);

By setting the reference in the Python class to null, WrapPyObject_SetObj will clear the old reference to the object from the AA, without setting a new one. Or, at least, it's supposed to. Can you spot the error?

For the answer, see changeset 113. It was a real "oh, shit" moment when I saw what was going on, I tell you what. (Hint: What's the type of null?)

The second thing this update does is improve Pyd's handling of arrays. It can now convert any iterable Python object to a dynamic array, as long as each element of the iterable can be converted to the value type of the array.

Finally, I've added a Repr struct template, which allows you to specify a member function to use as the Python type's __repr__ overload. I debated for a while having Pyd automatically wrap a toString overload, but toString doesn't have quite the same meaning as __repr__ does. The Repr struct allows you to explicitly use toString if it makes sense.

Monday, July 2, 2007

Compile-time reflection

This post originally appeared on the digitalmars.D newsgroup.


The subject of compile-time reflection has been an important one to me. I have been musing on it since about the time I started writing Pyd. Here is the current state of my thoughts on the matter.

Functions

When talking about functions, a given symbol may refer to multiple functions:

void foo() {}
void foo(int i) {}
void foo(int i, int j, int k=20) {}

The first thing a compile-time reflection mechanism needs is a way to, given a symbol, derive a tuple of the signatures of the function overloads. There is no immediately obvious syntax for this.

The is() expression has so far been the catch-all location for many of D's reflection capabilities. However, is() operates on types, not arbitrary symbols.

A property is more promising. Re-using the .tupleof property is one idea:

foo.tupleof ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

However, I am not sure how plausible it is to have a property on a symbol like this. Another alternative is to have some keyword act as a function (as typeof and typeid do, for instance). I propose adding "tupleof" as an actual keyword:

tupleof(foo) ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

I will be using this syntax throughout the rest of this post. For the sake of consistency, tupleof(Foo) should do what Foo.tupleof does now.

To umabiguously refer to a specific overload of a function, two pieces of information are required: The function's symbol, and the signature of the overload. When doing compile-time reflection, one is typically working with one specific overload at a time. While a function pointer does refer to one specific overload, it is important to note that function pointers are not compile-time entities! Therefore, the following idiom is common:

template UseFunction(alias func, func_t) {}

That is, any given template that does something with a function requires both the function's symbol and the signature of the particular overload to operate on to be useful.

It should be clear, then, that automatically deriving the overloads of a given function is very important. Another piece of information that is useful is whether a given function has default arguments, and how many. The tupleof() syntax can be re-used for this:

tupleof(foo, void function(int, int, int)) ⇒ Tuple!(void function(int, int))

Here, we pass tupleof() the symbol of a function, and the signature of a particular overload of that function. The result is a tuple of the various signatures it is valid to call the overload with, ignoring the actual signature of the function. The most useful piece of information here is the number of elements in the tuple, which will be equal to the number of default arguments supported by the overload.

One might be tempted to place these additional function signatures in the original tuple derived by tupleof(foo). However, this is not desirable. Consider: We can say any of the following:

void function() fn1 = &foo;
void function(int) fn2 = &foo;
void function(int, int, int) fn3 = &foo;

But we cannot say this:

void function(int, int) fn4 = &foo; // ERROR!

A given function-symbol therefore has two sets of function signatures associated with it: The actual signatures of the functions, and the additional signatures it may be called with due to default arguments. These two sets are not equal in status, and should not be treated as such.

Member functions

Here is where things get really complicated.

class A {
    void bar() {}
    void bar(int i) {}
    void bar(int i, int j, int k=20) {}

    void baz(real r) {}

    static void foobar() {}
    final void foobaz() {}
}

class B : A {
    void foo() {}
    override void baz(real r) {}
}

D does not really have pointers to member functions. It is possible to fake them with some delegate trickery. In particular, there is no way to directly call an alias of a member function. This is important, as I will get to later.

The first mechanism needed is a way to get all of the member functions of a class. I suggest the addition of a .methodsof class property, which will derive a tuple of aliases of the class's member functions.

A.methodsof ⇒ Tuple!(A.bar, A.baz, A.foobar, A.foobaz)
B.methodsof ⇒ Tuple!(A.bar, A.foobar, A.foobaz, B.foo, B.baz)

The order of the members in this tuple is not important. Inherited member functions are included, as well. Note that these are tuples of symbol aliases! Since these are function symbols, all of the mechanisms suggested earlier for regular function symbols should still work!

tupleof(A.bar) ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

And so forth.

There are three kinds of member functions: virtual, static, and final. The next important mechanism that is needed is a way to distinguish these from each other. An important rule of function overloading works in our favor, here: A given function symbol can only refer to functions which are all virtual, all static, or all final. Therefore, this should be considered a property of the symbol, as opposed to one of the function itself.

The actual syntax for this mechanism needs to be determined. D has 'static' and 'final' keywords, but no 'virtual' keyword. Additionally, the 'static' keyword has been overloaded with many meanings, and I hesitate suggesting we add another. Nonetheless, I do.

static(A.bar == static) == false
static(A.bar == final) == false
static(A.bar == virtual) == true

The syntax is derived from that of the is() expression. The grammar would look something like this:

StaticExpression:
    static ( Symbol == SymbolSpecialization )

SymbolSpecialization:
    static
    final
    virtual

Here, 'virtual' is a context-sensitive keyword, not unlike the 'exit' in 'scope(exit)'. If the Symbol is not a member function, it is an error.

A hole presents itself in this scheme. We can get all of the function symbols of a class's member functions. From these, we can get the signatures of their overloads. From these, can get get pointers to the member functions, do some delegate trickery, and actually call them. This is all well and good.

But there is a problem when a method has default arguments. As explained earlier, we can't do this:

// Error! None of the overloads match!
void function(int, int) member_func = &A.bar;

Even though we can say:

A a = new A;
a.bar(1, 2);

The simplest solution is to introduce some way to call an alias of a method directly. There are a few options. My favorite is to take a cue from Python, and allow the following:

alias A.bar fn;
A a = new A;
fn(a, 1, 2);

That is, allow the user to explicitly call the method with the instance as the first parameter. This should be allowed generally, as in:

A.bar(a);
A.baz(a, 5.5);

Given these mechanisms, combined with the existing mechanisms to derive the return type and parameter type tuple from a function type, D's compile-time reflection capabilities would be vastly more powerful.

Thursday, June 7, 2007

Syntax Highlighting!

Pygments has now been integrated into DSource's source browser, particularly including the D lexer I provided to Pygments (as well as the MiniD lexer that Jarrett Billingsley based on it). See for yourself! Pygments should also be working as the Wiki syntax highlighter.

So it was StackThreads after all...

It turns out that StackThreads doesn't compile in the latest versions of DMD at all. Very strange. I've passed this information on to Mikola, and hopefully it will be fixed at some point. Some changes were made in Pyd's trunk, which now compiles again.

Some emails have been floating around about the D conference, discussing schedules and logistics. In theory, I have to fill an hour talking about Pyd. The fact that it is now June, which is only two less than August has hit home, and I really need to get to work on what I'm going to say.

My Linux box broke. My best guess is that the power supply died (if it's not that, it's the motherboard). I bought a new one, which should arrive within a week. Mostly I just used the box to keep irssi open, but I had been helping Brad out with DSource with the thing. So that's annoying. On the other hand, my major goal when I built this thing was to spend as little as possible. With that criteria, the three years I got isn't bad.

Wednesday, May 23, 2007

Getting Moving Again

I'm back!

After putting Pyd away for a couple weeks back in February, I returned to it to find that the latest version of DMD was suddenly unable to compile it. DMD wasn't merely emitting some arcane error message, it was using up the whole CPU while eating more and more RAM, while printing nothing. Yikes.

I had figured that the only things which had changed were DMD's various new features and some of the new stuff I had crammed into Pyd. However, systematically commenting out my new code didn't change anything. Then simhau made a post to the Pyd forum which clarified things. He was running into these problems, too, when he said:

Tested some more, and [if] I set with_st=False it compiles and works.

Hell. StackThreads. I would never have thought of that. I flipped around CeleriD to set with_st to False by default, and things started working much more readily.

The next step is to figure out whether it's StackThreads or Pyd's iteration wrapping code that's causing DMD to freak out. I don't expect this to be hard. If it's StackThreads, I'll have to rely on Mikola to fix it. If it's my code, fixing it should be easy.

Expect me to commit something new to the svn repo in the near future reflecting all of this.

In other news! I have been invited to speak at the D Conference 2007 about Pyd. So that should be fun.

Thursday, February 22, 2007

Pyd Roadmap

I have finally started using Trac's roadmap feature with Pyd. Here is Pyd's roadmap.

Before RC2, I want to finish the auto-shim-generation feature by automatically adding operator overloads to shim classes. (I already have the code in there for the unary and binary operators, as well as opApply, opCmp, opCall, and maybe others. However, this code is disabled in rev 101.) I also want to get various building issues resolved. The first is adding Rebuild (and maybe Bud) support to CeleriD, which will in turn allow for Tango support. The second is getting Pyd working with DSSS. This should in turn solve the third issue, which is somehow devising a simple way to build applications that embed Python using Pyd.

Speaking of embedding, another pre-RC2 issue is finishing the PydObject class, and adding the PydInterpreter class. These go hand-in-hand with some features I quietly added a while back.

PydInterpreter represents the Python interpreter. It calls Py_Initialize when constructed and Py_Finalize when destructed. It has an Import method, which imports a Python module, and returns the module object as a PydObject. It has a Run method, which basically just calls PyRun_SimpleString and returns its result as a PydObject. Other methods may be added if I think of something useful.

The other embedding-related features involve extending the embedded interpreter. Embedded Python is rarely useful if your Python scripts cannot access functionality provided by the application. Therefore, Pyd must make available the full brunt of its class and function wrapping functionality, even when embedding.

I'm still trying to work out the best API for all of this, which is why I kept it quiet when I added some of this functionality. Basically, when you wrap a function or class in this way, you will also have to provide a module name. This module will exist entirely inside the embedded interpreter, and any Python scripts imported by your application will be able to import this module. This also implies that you could divide your application's functions and classes into multiple Python modules, which could be a useful organizational nicety.

While it would be possible to make functions and classes implicitly available in any imported scripts (so that the script would not have to import any modules to get at your application's functionality), I am split on whether to implement such a feature. It would be fairly un-Pythonic.

Sunday, February 18, 2007

Subverting polymorphism is fun!

Revision 100 of Pyd implements the API changes I discussed previously. There are still some rough edges, however. (These are mostly related to operator overloading. Static member function wrapping is broken, too.) I thought it would be amusing at this point to discuss Pyd's support of polymorphic behavior.

Take the following:

class Base {
    void foo() { writefln("Base.foo"); }
    void bar() { writefln("Base.bar"); }
}

class Derived : Base {
    void foo() { writefln("Derived.foo"); }
}

void polymorphic_call(Base b) {
    b.foo();
}

Let's say that a user writes a subclass of Derived in Python, and overloads the foo method:

class PyDerived(Derived):
    def foo(self):
        print "PyDerived.foo"

If an instance of PyDerived is passed to polymorphic_call, we would (optimally) expect it to print out "PyDerived.foo". The challenges involved in implementing this should become obvious if you start thinking about vtables.

Pyd solves this problem by introducing "shim" child classes. For every class that is exposed to Python, a shim class is generated. Every method that is being exposed to Python is overloaded in this shim, in such a way as to dispatch to Python if it detects that this is an instance of a Python subclass, and call the parent class's method otherwise.

If my explanation was non-sensical, then it is probably best to think of it as a black box. Suffice to say that, in the above example, Base and Derived each get a shim generated for them.

So for every class you want to expose to Python with Pyd, there are suddenly four classes involved:

  1. The original class
  2. The shim class
  3. The Python class directly wrapping the original class
  4. The Python class wrapping the shim

Number 3 might come as a suprise. Although the original class is wrapped, this is only done so that D functions can return instances of the original class directly to Python. Number 4 is the class which may be best thought of as the Python version of your D class.

Update Feb 22: Revision 101 of Pyd changes this. There is now only one Python class wrapping both the original D class and its shim.

Sunday, February 11, 2007

Improving Pyd's Inheritance Support

After being asked to explain what I wanted to see in D as far as new metaprogramming features are concerned by Walter, I thought of a way to vastly improve Pyd's treatment of class wrapping without any new features. The current API looks like this:

wrapped_class!(Foo) f;
f.def!(Foo.bar);
finalize_class(f);

This API design predates D's tuples. Basically: wrapped_class is a struct template with a number of static member functions. The def member function, for instance, creates the wrapper function around a method and appends it to a list. It is important to note that this list is created entirely at run-time. The three lines above are a series of statements. There is no way to use the list of methods wrapped in some other compile-time operation (such as generating the D wrapper classes needed to properly support inheritance).

The new API design I thought of looks like this:

wrap_class!(
    Foo,
    Def!(Foo.bar)
);

In this case, wrap_class is a function template like this:

void wrap_class(C, T ...) ();

C is the class being wrapped, and T is a tuple of various pre-defined struct templates, which replace the static member functions of wrapped_class.

Def, then, is a struct template which should look somewhat familiar:

struct Def(alias fn, char[] name=symbolnameof!(fn), fn_t=typeof(&fn), uint MIN_ARGS=minArgs!(fn))

Not only does the call to wrap_class look cleaner than the old API (for one, it doesn't have to create any spurious local variables), but it makes all of the information about a class available at compile-time. This can be used, together with the new mixins, to automatically generate the stupid wrapper classes used by Pyd to properly handle polymorphism. This would hide them in the guts of Pyd, where they firmly belong.

One issue this brings up is docstring support. Docstrings were previously supported through the (if I may say so) fairly elegant solution of making them regular function arguments. The first solution that comes to mind with this new API is this slightly more arcane one:

struct Docstring {
    char[] name, doc;
}

void docstrings(Docstring[] docs...);

// ... PydMain ...
    docstrings(
        "Foo.bar",  "This function does such-and-such.",
        "SomeFunc", "This other function does something else."
    );
// ...

Gotta love D's typesafe variadics.

Saturday, February 10, 2007

Moving Pyd forward

Pyd, my Python-D interoperability library, has a few outstanding issues. None of them are very appealing to tackle.

While looking at what it would take to support Gregor Richard's Rebuild, I realised that CeleriD is a complete mess. I may end up doing a major refactoring of dcompiler.py before I'm done.

I recently got Pyd's inheritance support up to spec in SVN. While it can now properly handle wrapping a class and its parent (such that both D and Python are happy with the arrangement), the client code needed to bring this about is dreadfully ugly. D's new mixins might be able to provide some relief here, though I haven't really taken a close look at it, yet.

There is some demand for supporting the embedding of Python in D applications using Pyd. The PydObject class is the first step towards supporting this. (That was, in fact, the very first part of Pyd I ever wrote.) However, the class is a sort of half-hearted effort, and I haven't seriously run it through the wringer, yet. The main problem that I have to tackle before supporting embedding is developing a robust solution for building applications that embed Python with Pyd.

Pyd is difficult to build. This has generally not been an issue, as it comes with its own build tool (CeleriD) that knows all about the trickery needed to coerce it into building. If a user knows what they are doing, they can even convince Bud or Rebuild to build it.

First, Pyd and the Python bindings require a handful of version flags to be defined. The first allows Pyd to know the version of Python being used. Since Pyd only supports Python 2.4 and 2.5, the version flag must be one of Python_2_4_Or_Later or Python_2_5_Or_Later. This is currently a stupid naming convention, since the 2_4 flag isn't defined when using 2.5, but it hasn't been an issue with Pyd so far, and client code isn't really expected to use these flags.

The second version flag defines what version of Unicode Python has been compiled to use, UCS2 or UCS4. This flag should be one of Python_Unicode_UCS2 or Python_Unicode_UCS4. I have never tested using the UCS4 version, since all of the versions of Python I have use UCS2. There is no particular reason why it shouldn't work, however.

This brings to light a weakness of Pyd's, which is string support. Although both D and Python are fully Unicode-capable, they are so in different ways. D deals with Unicode strictly in terms of UTF-8, UTF-16, and UTF-32 (with the char[], wchar[], and dchar[] string types, respectively). Python has the unicode type, which may be either UCS2 or UCS4 as previously mentioned, as well as the ability to use any 8-bit encoding, including UTF-8, with its str type. While converting between these encodings isn't particularly difficult, Pyd is currently incredibly stupid in this regard: It only knows how to return a Python str as a char[]. Dumping a UCS2 string into a wchar[] should also be safe, but I haven't implemented it, yet. (And likewise UCS4 and dchar[].)

This reveals another issue of Pyd's. Once upon a time, the char[] that Pyd converted the str to was just a slice over the str's internal buffer. This was incredibly efficient, but also intensely dangerous if the char[]'s lifetime exceeded that of the str. I did not consider this an overly serious issue until I implemented struct wrapping. As soon as I tested a struct with a char[] member, screaming alarms and klaxons went off, as the memory previously used by a string started getting overwritten with garbage.

Not to mention that Python's strings are supposed to be immutable! It would be all too possible for D code to alter the internal buffer of a Python string without meaning to.

So now Pyd .dups all strings returned from Python. Thanks to D's lack of runtime const (which the newsgroup seems to think will change at some point in the future, thankfully), this is the only sane behavior. However, I have gotten an email complaining of Pyd's poor performance with strings. I'm not really sure what to do about it.

Where was I? Oh yes! Version flags. There are only two more that Pyd cares about. The first is Pyd_with_StackThreads, the other is Pyd_with_Tango. The latter is a stupid one, and will be dropped in favor of the standard Tango version flag in future updates. Furthermore, Tango support in Pyd is totally experimental and untested. (Tango requires a bud- or rebuild-like tool to build, and CeleriD doesn't yet support either, so I haven't used Tango with Pyd, yet.) StackThreads is used to wrap D's iteration protocol (opApply). It also doesn't work on Linux. However, it's usually irrelevant when embedding Pyd. (Unless you start talking about extending the embedded interpreter, which should be considered an advanced topic.) In short, you usually won't go wrong ignoring both of these.

(As an aside, Tango includes functionality similar to StackThreads, which even works on Linux. Therefore, Rebuild support in CeleriD equals Tango support in Pyd equals opApply wrapping support on Linux.)

The final step to building Pyd to embed it is to build the Python header and link against the Python runtime. First you need to pick whichever version of Python you're using, and look in the appropriate subdirectory of infrastructure/python. The directory will contain the bindings for that version of the Python/C API (python.d) and a .lib file for the Python DLL, which DMD needs to link to it on Windows. The python.d file is set up with some version(build) and pragma(link) trickery to try this linking step for you, so just make sure, if you're on Windows, that bud/rebuild can see the .lib. (If you're on Linux, just make sure the correct version of Python is installed.)

Hopefully (for the three or so of you out there that know what the hell I'm talking about) that will get Pyd to compile and link.

This probably highlights the need to get Pyd working with DSSS.