The Audacity to Code: July 2007

Friday, July 27, 2007

Inadequacies of __traits

It is good that the __traits syntax is extensible, since it is missing a few important things. Some of these did not occur to me until I actually started writing code with it.

Take this class:

class Foo {
    int i;
    this() { i = 10; }
    this(int i) { this.i = i; }
    void foo() { writefln("Foo.i is %s", this.i); }
    real foo(double) { return 2.0; }
    static int bar(int j) { return j*2; }
    static real bar() { return 5.0; }
}

__traits(allMembers, Foo) gives us this:

[i,_ctor,foo,print,toString,toHash,opCmp,opEquals]

One of these in particular jumps out, which is "_ctor". This obviously has something to do with the constructor. If the class doesn't define a constructor, then it is not present. If the class defines a destructor, then a "_dtor" member will be present. It turns out that "_ctor" is only very narrowly useful.

All of the following give false:

writefln(__traits(isVirtualFunction, __traits(getMember, Foo, "_ctor"))); writefln(__traits(isAbstractFunction, __traits(getMember, Foo, "_ctor"))); writefln(__traits(isFinalFunction, __traits(getMember, Foo, "_ctor")));

The same results are given if we just say Foo._ctor. This can only mean _ctor is a static function... except it isn't:

auto f = new Foo; // okay
auto g = Foo._ctor(50); // Error: need 'this' to access member this
auto h = f._ctor(100); // okay

What the heck? What is _ctor supposed to be, anyway? Interestingly, by calling f._ctor(100), it modifies f, and f.i becomes 100; f and h refer to the same object.

Next, the allMembers trait doesn't include static members of the class. I suggest either adding an isStaticMember trait and including static members in allMembers and derivedMembers; or adding a pair of allStaticMembers and derivedStaticMembers traits.

However, my primary complaint with traits is that you can only determine the signatures of the overloads of virtual functions. This most notably means you cannot determine the overloads of a global function, and being unable to do it for static and final functions is unfortunate as well. Additionally, you cannot determine the signatures of the class's constructor (which I might expect the _ctor member to be used for).

Here are some more minor issues:

There is no isStaticFunction trait. Though it is not strictly necessary, since you can determine it by process of elimination using the other isFooFunction traits, it would be convenient.
There is no isDynamicArray trait. This isn't much of an issue, since you can determine whether something is a dynamic array type using regular templates, but it would be consistent with the isStaticArray and isAssociativeArray traits.

Sunday, July 22, 2007

__traits and Pyd

D 2.003 adds compile-time reflection, very much in the manner of what my previous post was asking for. These new features are all included under the umbrella of the new __traits keyword, which the grammar refers to as the TraitsExpression.

I've re-written Pyd about three times. The first time was when I wrote my own pseudo-tuple library (now defunct). The second time was when I replaced that with Tomasz Stachowiak's (aka h3r3tic) tuple library (which later evolved into std.bind). The third time was when D received proper language support for tuples. This last rewrite was characterized by a vast reduction in code.

Subsequent updates to Pyd have focused on its class-wrapping features, particularly the various insanities I've introduced to handle polymorphic behavior properly. Despite my best efforts, there are still some rather serious flaws in it:

Inherited methods are not automatically placed in the generated shim class. This means that, to get the full power of Pyd's polymorphism wrapping, you need to explicitly wrap all of the methods of a class, including the inherited ones. I don't believe this is documented, either, but, then again, it has never come up.
There are some serious issues with symbol identifier lengths, which I am still grappling with.

With __traits, all of this simply goes away. Wrapping a class, any class, could easily be made to look like this:

wrap_class!(Foo);

And it would all Just Work. I hope to have it done before the conference.

Naturally, this means future versions of Pyd will probably require D 2.003 or newer. I do regret having to do this. It means that, in addition to getting all this new support for __traits in there, I will have to get Pyd and the Python/C API bindings compliant with the new const semantics. This could very well end up being harder, but I won't know until I try.

Thursday, July 5, 2007

Pyd revision 113

I've committed a new update to Pyd. It does three things of note:

First, it fixes a horrible bug in how Pyd keeps class references. Whenever a class instance is returned to Python, Pyd keeps a reference to that instance in an AA. In fact, Pyd has a number of these AAs: One dedicated to class instances, and one more for each other type which may be passed to Python. (For instance, delegates, which may hold a reference to GC-controlled data.)

There is an internal Pyd function for replacing what D object a wrapping Python object points to. This function also takes care of tracking this reference keeping, adding and removing references when needed. It is a template function, called void WrapPyObject_SetObj(T)(PyObject* self, T t).

This function is called whenever you instantiate a Python type which wraps a D class, and when that Python type is deallocated. In the deallocation function, it was called like this:

WrapPyObject_SetObj(self, null);

By setting the reference in the Python class to null, WrapPyObject_SetObj will clear the old reference to the object from the AA, without setting a new one. Or, at least, it's supposed to. Can you spot the error?

For the answer, see changeset 113. It was a real "oh, shit" moment when I saw what was going on, I tell you what. (Hint: What's the type of null?)

The second thing this update does is improve Pyd's handling of arrays. It can now convert any iterable Python object to a dynamic array, as long as each element of the iterable can be converted to the value type of the array.

Finally, I've added a Repr struct template, which allows you to specify a member function to use as the Python type's __repr__ overload. I debated for a while having Pyd automatically wrap a toString overload, but toString doesn't have quite the same meaning as __repr__ does. The Repr struct allows you to explicitly use toString if it makes sense.

Monday, July 2, 2007

Compile-time reflection

This post originally appeared on the digitalmars.D newsgroup.

The subject of compile-time reflection has been an important one to me. I have been musing on it since about the time I started writing Pyd. Here is the current state of my thoughts on the matter.

Functions

When talking about functions, a given symbol may refer to multiple functions:

void foo() {}
void foo(int i) {}
void foo(int i, int j, int k=20) {}

The first thing a compile-time reflection mechanism needs is a way to, given a symbol, derive a tuple of the signatures of the function overloads. There is no immediately obvious syntax for this.

The is() expression has so far been the catch-all location for many of D's reflection capabilities. However, is() operates on types, not arbitrary symbols.

A property is more promising. Re-using the .tupleof property is one idea:

foo.tupleof ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

However, I am not sure how plausible it is to have a property on a symbol like this. Another alternative is to have some keyword act as a function (as typeof and typeid do, for instance). I propose adding "tupleof" as an actual keyword:

tupleof(foo) ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

I will be using this syntax throughout the rest of this post. For the sake of consistency, tupleof(Foo) should do what Foo.tupleof does now.

To umabiguously refer to a specific overload of a function, two pieces of information are required: The function's symbol, and the signature of the overload. When doing compile-time reflection, one is typically working with one specific overload at a time. While a function pointer does refer to one specific overload, it is important to note that function pointers are not compile-time entities! Therefore, the following idiom is common:

template UseFunction(alias func, func_t) {}

That is, any given template that does something with a function requires both the function's symbol and the signature of the particular overload to operate on to be useful.

It should be clear, then, that automatically deriving the overloads of a given function is very important. Another piece of information that is useful is whether a given function has default arguments, and how many. The tupleof() syntax can be re-used for this:

tupleof(foo, void function(int, int, int)) ⇒ Tuple!(void function(int, int))

Here, we pass tupleof() the symbol of a function, and the signature of a particular overload of that function. The result is a tuple of the various signatures it is valid to call the overload with, ignoring the actual signature of the function. The most useful piece of information here is the number of elements in the tuple, which will be equal to the number of default arguments supported by the overload.

One might be tempted to place these additional function signatures in the original tuple derived by tupleof(foo). However, this is not desirable. Consider: We can say any of the following:

void function() fn1 = &foo;
void function(int) fn2 = &foo;
void function(int, int, int) fn3 = &foo;

But we cannot say this:

void function(int, int) fn4 = &foo; // ERROR!

A given function-symbol therefore has two sets of function signatures associated with it: The actual signatures of the functions, and the additional signatures it may be called with due to default arguments. These two sets are not equal in status, and should not be treated as such.

Member functions

Here is where things get really complicated.

class A {
    void bar() {}
    void bar(int i) {}
    void bar(int i, int j, int k=20) {}

    void baz(real r) {}

    static void foobar() {}
    final void foobaz() {}
}

class B : A {
    void foo() {}
    override void baz(real r) {}
}

D does not really have pointers to member functions. It is possible to fake them with some delegate trickery. In particular, there is no way to directly call an alias of a member function. This is important, as I will get to later.

The first mechanism needed is a way to get all of the member functions of a class. I suggest the addition of a .methodsof class property, which will derive a tuple of aliases of the class's member functions.

A.methodsof ⇒ Tuple!(A.bar, A.baz, A.foobar, A.foobaz)
B.methodsof ⇒ Tuple!(A.bar, A.foobar, A.foobaz, B.foo, B.baz)

The order of the members in this tuple is not important. Inherited member functions are included, as well. Note that these are tuples of symbol aliases! Since these are function symbols, all of the mechanisms suggested earlier for regular function symbols should still work!

tupleof(A.bar) ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

And so forth.

There are three kinds of member functions: virtual, static, and final. The next important mechanism that is needed is a way to distinguish these from each other. An important rule of function overloading works in our favor, here: A given function symbol can only refer to functions which are all virtual, all static, or all final. Therefore, this should be considered a property of the symbol, as opposed to one of the function itself.

The actual syntax for this mechanism needs to be determined. D has 'static' and 'final' keywords, but no 'virtual' keyword. Additionally, the 'static' keyword has been overloaded with many meanings, and I hesitate suggesting we add another. Nonetheless, I do.

static(A.bar == static) == false
static(A.bar == final) == false
static(A.bar == virtual) == true

The syntax is derived from that of the is() expression. The grammar would look something like this:

StaticExpression:
    static ( Symbol == SymbolSpecialization )

SymbolSpecialization:
    static
    final
    virtual

Here, 'virtual' is a context-sensitive keyword, not unlike the 'exit' in 'scope(exit)'. If the Symbol is not a member function, it is an error.

A hole presents itself in this scheme. We can get all of the function symbols of a class's member functions. From these, we can get the signatures of their overloads. From these, can get get pointers to the member functions, do some delegate trickery, and actually call them. This is all well and good.

But there is a problem when a method has default arguments. As explained earlier, we can't do this:

// Error! None of the overloads match!
void function(int, int) member_func = &A.bar;

Even though we can say:

A a = new A;
a.bar(1, 2);

The simplest solution is to introduce some way to call an alias of a method directly. There are a few options. My favorite is to take a cue from Python, and allow the following:

alias A.bar fn;
A a = new A;
fn(a, 1, 2);

That is, allow the user to explicitly call the method with the instance as the first parameter. This should be allowed generally, as in:

A.bar(a);
A.baz(a, 5.5);

Given these mechanisms, combined with the existing mechanisms to derive the return type and parameter type tuple from a function type, D's compile-time reflection capabilities would be vastly more powerful.

The Audacity to Code