Friday, November 13, 2009

On the Compilation of Go Packages

Man, Go sucks, the creators went and invented their own encoding, which Go source code is required to be in.

Compilation environment

I have been playing with the Go programming language. The gc compilation environment is broadly similar to gcc's, but with enough subtle differences that I am seeing a lot of confusion about how it works.

Traditionally, C is thought of as consisting of .c source files and .h header files. Each .c file compiles to a .o object file. The object files are then linked into some manner of final binary, be it a .a static library, a .so dynamic library, or an executable binary.

In this scenario, the binary depends on the object files which are linked into it, the object files each depend on their respective source file, and each source file depends on the headers which it includes (directly or indirectly).

The gc compiler for Go (the one also known variously as 5g, 6g, and 8g) works slightly differently. Here, we have .go source files. Every .go file belongs to exactly one package, and a package may be comprised of as many .go files as you desire. Packages have a .5, .6, or .8 extension, depending on the architecture you are compiling for. A package is obtained by compiling all of the .go files which comprise it.

Packages are the result of compiling .go files. They fill two roles. They are roughly the equivalent of object files in C (though, as I will explore in a moment, they are not exactly equivalent). They also fill the role that header files fill in C. Packages contain meta-data describing what the public interface of the package is.

Therefore, in order for one package to use another, you require the other package to be compiled, and you do not require the source files for the other package. Further, when a package is compiled, it statically links in all of its dependencies.

Once you have a "main" package, you may then link it into an executable binary. The only file required to do this is the compiled "main" package.

An example

Here is an example Makefile for a simple executable binary. The binary will be named hello. It depends on two custom packages, foo and bar, which each consist of a number of .go files. The "main" package itself contains two .go files: hello.go and goodbye.go.

For the sake of clarity, all of these source files live in the same directory (in a real application, each package's source files might live in their own directory), and I have hardcoded the architecture to amd64.

MAIN_FILES=hello.go goodbye.go
FOO_FILES=foo1.go foo2.go foo3.go
BAR_FILES=bar1.go bar2.go bar3.go

hello: main.6
        6l -o hello main.6
main.6: $(MAIN_FILES) foo.6 bar.6
        6g -I. -o main.6 $(MAIN_FILES)
foo.6: $(FOO_FILES)
        6g -o foo.6 $(FOO_FILES)
bar.6: $(BAR_FILES)
        6g -o bar.6 $(BAR_FILES)
.PHONY: clean
clean:
        rm -f hello main.6 foo.6 bar.6

I will explain each of these rules one at a time.

hello: main.6
        6l -o hello main.6

The hello executable binary is obtained by linking the main package. This package already (statically) contains the other packages which it depends on, thus, the only prerequisite is the main package itself.

main.6: $(MAIN_FILES) foo.6 bar.6
        6g -I. -o main.6 $(MAIN_FILES)

The main package is obtained by compiling the source files which comprise the package, and statically linking in the packages which they depend on. Thus, the foo and bar packages are prerequisites of the main package. However, we do not explicitly pass these packages to the compiler. Instead, we pass the compiler the -I option to inform it of where these packages may be found. It is therefore important for a package's filename to match the name it is imported as.

foo.6: $(FOO_FILES)
        6g -o foo.6 $(FOO_FILES)
bar.6: $(BAR_FILES)
        6g -o bar.6 $(BAR_FILES)

The foo and bar packages are simple, and consist only of their own source files.

.PHONY: clean
clean:
        rm -f hello main.6 foo.6 bar.6

We can make clean by removing all of the binary files. These are the binary, and the packages.

Simplifying matters

It would be relatively straightforward to write a tool which, given a list of source files, could determine which files belong to which package, what the dependencies between these packages are, and automatically rebuild what is required in a make-like fashion. Writing such a tool may be a project for the future.

Saturday, December 29, 2007

PySoy for Windows

Recently, I've been helping the PySoy project get things working on Windows. Today I finally made some visible progress. (Click for unscaled image.)

PySoy on Windows

Sunday, August 26, 2007

D Conference 2007 Aftermath

That was amazing! Thanks to Brad and Amazon for hosting!

I had a moment of abject terror during my presentation, when I finished running through my slides 15 minutes into my hour-long block of time. Thankfully, people started asking questions, and the remaining 45 minutes went by in what seemed like no time at all.

My "slides" may be found here. The video of my presentation (and the others!) will be online eventually.

I was asked a couple of times for details about Pyd's design, and its history. Watch this space for more than you ever wanted to know on this topic.

More coverage of the conference can of course be found on Planet D.

Friday, July 27, 2007

Inadequacies of __traits

It is good that the __traits syntax is extensible, since it is missing a few important things. Some of these did not occur to me until I actually started writing code with it.

Take this class:

class Foo {
    int i;
    this() { i = 10; }
    this(int i) { this.i = i; }
    void foo() { writefln("Foo.i is %s", this.i); }
    real foo(double) { return 2.0; }
    static int bar(int j) { return j*2; }
    static real bar() { return 5.0; }
}
__traits(allMembers, Foo) gives us this:

[i,_ctor,foo,print,toString,toHash,opCmp,opEquals]

One of these in particular jumps out, which is "_ctor". This obviously has something to do with the constructor. If the class doesn't define a constructor, then it is not present. If the class defines a destructor, then a "_dtor" member will be present. It turns out that "_ctor" is only very narrowly useful.

All of the following give false:

writefln(__traits(isVirtualFunction, __traits(getMember, Foo, "_ctor")));
writefln(__traits(isAbstractFunction, __traits(getMember, Foo, "_ctor")));
writefln(__traits(isFinalFunction, __traits(getMember, Foo, "_ctor")));

The same results are given if we just say Foo._ctor. This can only mean _ctor is a static function... except it isn't:

auto f = new Foo; // okay
auto g = Foo._ctor(50); // Error: need 'this' to access member this
auto h = f._ctor(100); // okay

What the heck? What is _ctor supposed to be, anyway? Interestingly, by calling f._ctor(100), it modifies f, and f.i becomes 100; f and h refer to the same object.

Next, the allMembers trait doesn't include static members of the class. I suggest either adding an isStaticMember trait and including static members in allMembers and derivedMembers; or adding a pair of allStaticMembers and derivedStaticMembers traits.

However, my primary complaint with traits is that you can only determine the signatures of the overloads of virtual functions. This most notably means you cannot determine the overloads of a global function, and being unable to do it for static and final functions is unfortunate as well. Additionally, you cannot determine the signatures of the class's constructor (which I might expect the _ctor member to be used for).

Here are some more minor issues:

  • There is no isStaticFunction trait. Though it is not strictly necessary, since you can determine it by process of elimination using the other isFooFunction traits, it would be convenient.
  • There is no isDynamicArray trait. This isn't much of an issue, since you can determine whether something is a dynamic array type using regular templates, but it would be consistent with the isStaticArray and isAssociativeArray traits.

Sunday, July 22, 2007

__traits and Pyd

D 2.003 adds compile-time reflection, very much in the manner of what my previous post was asking for. These new features are all included under the umbrella of the new __traits keyword, which the grammar refers to as the TraitsExpression.

I've re-written Pyd about three times. The first time was when I wrote my own pseudo-tuple library (now defunct). The second time was when I replaced that with Tomasz Stachowiak's (aka h3r3tic) tuple library (which later evolved into std.bind). The third time was when D received proper language support for tuples. This last rewrite was characterized by a vast reduction in code.

Subsequent updates to Pyd have focused on its class-wrapping features, particularly the various insanities I've introduced to handle polymorphic behavior properly. Despite my best efforts, there are still some rather serious flaws in it:

  • Inherited methods are not automatically placed in the generated shim class. This means that, to get the full power of Pyd's polymorphism wrapping, you need to explicitly wrap all of the methods of a class, including the inherited ones. I don't believe this is documented, either, but, then again, it has never come up.
  • There are some serious issues with symbol identifier lengths, which I am still grappling with.

With __traits, all of this simply goes away. Wrapping a class, any class, could easily be made to look like this:

wrap_class!(Foo);

And it would all Just Work. I hope to have it done before the conference.

Naturally, this means future versions of Pyd will probably require D 2.003 or newer. I do regret having to do this. It means that, in addition to getting all this new support for __traits in there, I will have to get Pyd and the Python/C API bindings compliant with the new const semantics. This could very well end up being harder, but I won't know until I try.

Thursday, July 5, 2007

Pyd revision 113

I've committed a new update to Pyd. It does three things of note:

First, it fixes a horrible bug in how Pyd keeps class references. Whenever a class instance is returned to Python, Pyd keeps a reference to that instance in an AA. In fact, Pyd has a number of these AAs: One dedicated to class instances, and one more for each other type which may be passed to Python. (For instance, delegates, which may hold a reference to GC-controlled data.)

There is an internal Pyd function for replacing what D object a wrapping Python object points to. This function also takes care of tracking this reference keeping, adding and removing references when needed. It is a template function, called void WrapPyObject_SetObj(T)(PyObject* self, T t).

This function is called whenever you instantiate a Python type which wraps a D class, and when that Python type is deallocated. In the deallocation function, it was called like this:

WrapPyObject_SetObj(self, null);

By setting the reference in the Python class to null, WrapPyObject_SetObj will clear the old reference to the object from the AA, without setting a new one. Or, at least, it's supposed to. Can you spot the error?

For the answer, see changeset 113. It was a real "oh, shit" moment when I saw what was going on, I tell you what. (Hint: What's the type of null?)

The second thing this update does is improve Pyd's handling of arrays. It can now convert any iterable Python object to a dynamic array, as long as each element of the iterable can be converted to the value type of the array.

Finally, I've added a Repr struct template, which allows you to specify a member function to use as the Python type's __repr__ overload. I debated for a while having Pyd automatically wrap a toString overload, but toString doesn't have quite the same meaning as __repr__ does. The Repr struct allows you to explicitly use toString if it makes sense.

Monday, July 2, 2007

Compile-time reflection

This post originally appeared on the digitalmars.D newsgroup.


The subject of compile-time reflection has been an important one to me. I have been musing on it since about the time I started writing Pyd. Here is the current state of my thoughts on the matter.

Functions

When talking about functions, a given symbol may refer to multiple functions:

void foo() {}
void foo(int i) {}
void foo(int i, int j, int k=20) {}

The first thing a compile-time reflection mechanism needs is a way to, given a symbol, derive a tuple of the signatures of the function overloads. There is no immediately obvious syntax for this.

The is() expression has so far been the catch-all location for many of D's reflection capabilities. However, is() operates on types, not arbitrary symbols.

A property is more promising. Re-using the .tupleof property is one idea:

foo.tupleof ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

However, I am not sure how plausible it is to have a property on a symbol like this. Another alternative is to have some keyword act as a function (as typeof and typeid do, for instance). I propose adding "tupleof" as an actual keyword:

tupleof(foo) ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

I will be using this syntax throughout the rest of this post. For the sake of consistency, tupleof(Foo) should do what Foo.tupleof does now.

To umabiguously refer to a specific overload of a function, two pieces of information are required: The function's symbol, and the signature of the overload. When doing compile-time reflection, one is typically working with one specific overload at a time. While a function pointer does refer to one specific overload, it is important to note that function pointers are not compile-time entities! Therefore, the following idiom is common:

template UseFunction(alias func, func_t) {}

That is, any given template that does something with a function requires both the function's symbol and the signature of the particular overload to operate on to be useful.

It should be clear, then, that automatically deriving the overloads of a given function is very important. Another piece of information that is useful is whether a given function has default arguments, and how many. The tupleof() syntax can be re-used for this:

tupleof(foo, void function(int, int, int)) ⇒ Tuple!(void function(int, int))

Here, we pass tupleof() the symbol of a function, and the signature of a particular overload of that function. The result is a tuple of the various signatures it is valid to call the overload with, ignoring the actual signature of the function. The most useful piece of information here is the number of elements in the tuple, which will be equal to the number of default arguments supported by the overload.

One might be tempted to place these additional function signatures in the original tuple derived by tupleof(foo). However, this is not desirable. Consider: We can say any of the following:

void function() fn1 = &foo;
void function(int) fn2 = &foo;
void function(int, int, int) fn3 = &foo;

But we cannot say this:

void function(int, int) fn4 = &foo; // ERROR!

A given function-symbol therefore has two sets of function signatures associated with it: The actual signatures of the functions, and the additional signatures it may be called with due to default arguments. These two sets are not equal in status, and should not be treated as such.

Member functions

Here is where things get really complicated.

class A {
    void bar() {}
    void bar(int i) {}
    void bar(int i, int j, int k=20) {}

    void baz(real r) {}

    static void foobar() {}
    final void foobaz() {}
}

class B : A {
    void foo() {}
    override void baz(real r) {}
}

D does not really have pointers to member functions. It is possible to fake them with some delegate trickery. In particular, there is no way to directly call an alias of a member function. This is important, as I will get to later.

The first mechanism needed is a way to get all of the member functions of a class. I suggest the addition of a .methodsof class property, which will derive a tuple of aliases of the class's member functions.

A.methodsof ⇒ Tuple!(A.bar, A.baz, A.foobar, A.foobaz)
B.methodsof ⇒ Tuple!(A.bar, A.foobar, A.foobaz, B.foo, B.baz)

The order of the members in this tuple is not important. Inherited member functions are included, as well. Note that these are tuples of symbol aliases! Since these are function symbols, all of the mechanisms suggested earlier for regular function symbols should still work!

tupleof(A.bar) ⇒ Tuple!(void function(), void function(int), void function(int, int, int))

And so forth.

There are three kinds of member functions: virtual, static, and final. The next important mechanism that is needed is a way to distinguish these from each other. An important rule of function overloading works in our favor, here: A given function symbol can only refer to functions which are all virtual, all static, or all final. Therefore, this should be considered a property of the symbol, as opposed to one of the function itself.

The actual syntax for this mechanism needs to be determined. D has 'static' and 'final' keywords, but no 'virtual' keyword. Additionally, the 'static' keyword has been overloaded with many meanings, and I hesitate suggesting we add another. Nonetheless, I do.

static(A.bar == static) == false
static(A.bar == final) == false
static(A.bar == virtual) == true

The syntax is derived from that of the is() expression. The grammar would look something like this:

StaticExpression:
    static ( Symbol == SymbolSpecialization )

SymbolSpecialization:
    static
    final
    virtual

Here, 'virtual' is a context-sensitive keyword, not unlike the 'exit' in 'scope(exit)'. If the Symbol is not a member function, it is an error.

A hole presents itself in this scheme. We can get all of the function symbols of a class's member functions. From these, we can get the signatures of their overloads. From these, can get get pointers to the member functions, do some delegate trickery, and actually call them. This is all well and good.

But there is a problem when a method has default arguments. As explained earlier, we can't do this:

// Error! None of the overloads match!
void function(int, int) member_func = &A.bar;

Even though we can say:

A a = new A;
a.bar(1, 2);

The simplest solution is to introduce some way to call an alias of a method directly. There are a few options. My favorite is to take a cue from Python, and allow the following:

alias A.bar fn;
A a = new A;
fn(a, 1, 2);

That is, allow the user to explicitly call the method with the instance as the first parameter. This should be allowed generally, as in:

A.bar(a);
A.baz(a, 5.5);

Given these mechanisms, combined with the existing mechanisms to derive the return type and parameter type tuple from a function type, D's compile-time reflection capabilities would be vastly more powerful.