An exciting week.

This week was exciting. Spending it on learning about the go runtime was the reason for this. As insightfull as it was however, it also confused me a little bit. Before this goes any further, I should state that this is a partial report on my research and my findings. My aims for this week were the following: To investigate the behavior of go programs under the Hurd, to study the go runtime, and possibly modify it to see if the goroutine issues are libpthread’s issue or the go’s runtime issue.

Presenting my findings.

Most of my time was spent studying the gcc go frontend, libgo and the go runtime. Fortunatelly, I can say (gladly) that it was time well spent. What I got from it were some nice pieces of insight, but also some slight confusion and doubts.

The first interesting thing in my findings was this:

struct	G
{
	Defer*	defer;
	Panic*	panic;
	void*	exception;	// current exception being thrown
	bool	is_foreign;	// whether current exception from other language
	void	*gcstack;	// if status==Gsyscall, gcstack = stackbase to use during gc
	uintptr	gcstack_size;
	void*	gcnext_segment;
	void*	gcnext_sp;
	void*	gcinitial_sp;
	ucontext_t gcregs;
	byte*	entry;		// initial function
	G*	alllink;	// on allg
	void*	param;		// passed parameter on wakeup
	bool	fromgogo;	// reached from gogo
	int16	status;
	int64	goid;
	uint32	selgen;		// valid sudog pointer
	const char*	waitreason;	// if status==Gwaiting
	G*	schedlink;
	bool	readyonstop;
	bool	ispanic;
	bool	issystem;
	int8	raceignore; // ignore race detection events
	M*	m;		// for debuggers, but offset not hard-coded
	M*	lockedm;
	M*	idlem;
	int32	sig;
	int32	writenbuf;
	byte*	writebuf;
	// DeferChunk	*dchunk;
	// DeferChunk	*dchunknext;
	uintptr	sigcode0;
	uintptr	sigcode1;
	// uintptr	sigpc;
	uintptr	gopc;	// pc of go statement that created this goroutine

	int32	ncgo;
	CgoMal*	cgomal;

	Traceback* traceback;

	ucontext_t	context;
	void*		stack_context[10];
};

Yep. This is the code that resembles a (yeah, you guessed it, a goroutine). I was pretty surprised at first to see that a thread is resembled as a struct. But then again, taking a closer look at it, it makes perfect sense. The next one though was a lot trickier:

struct	M
{
	G*	g0;		// goroutine with scheduling stack
	G*	gsignal;	// signal-handling G
	G*	curg;		// current running goroutine
	int32	id;
	int32	mallocing;
	int32	throwing;
	int32	gcing;
	int32	locks;
	int32	nomemprof;
	int32	waitnextg;
	int32	dying;
	int32	profilehz;
	int32	helpgc;
	uint32	fastrand;
	uint64	ncgocall;	// number of cgo calls in total
	Note	havenextg;
	G*	nextg;
	M*	alllink;	// on allm
	M*	schedlink;
	MCache	*mcache;
	G*	lockedg;
	G*	idleg;
	Location createstack[32];	// Stack that created this thread.
	M*	nextwaitm;	// next M waiting for lock
	uintptr	waitsema;	// semaphore for parking on locks
	uint32	waitsemacount;
	uint32	waitsemalock;
	GCStats	gcstats;
	bool	racecall;
	void*	racepc;

	uintptr	settype_buf[1024];
	uintptr	settype_bufsize;

	uintptr	end[];
};

This was a source of endless confusion at the beginning. It does have some hints reassuring the fact that G’s are indeed goroutines, but nothing that really helps to describe what an M is. It’s structure is identical to that of the G however, which means that it might have something to do with a thread. And indeed it is. Further study of the source code made me speculate that M’s must be the real operating system scheduled (kernel) threads, while G’s (goroutines) must be the lightweight threads managed by the go runtime.

I was more than happy to find comments that reassured that position of mine.

// The go scheduler's job is to match ready-to-run goroutines (`g's)
// with waiting-for-work schedulers (`m's)

Another cool finding was the go (runtime) scheduler - from which the above comment originates:

struct Sched {
	Lock;

	G *gfree;	// available g's (status == Gdead)
	int64 goidgen;

	G *ghead;	// g's waiting to run
	G *gtail;
	int32 gwait;	// number of g's waiting to run
	int32 gcount;	// number of g's that are alive
	int32 grunning;	// number of g's running on cpu or in syscall

	M *mhead;	// m's waiting for work
	int32 mwait;	// number of m's waiting for work
	int32 mcount;	// number of m's that have been created

	volatile uint32 atomic;	// atomic scheduling word (see below)

	int32 profilehz;	// cpu profiling rate

	bool init;  // running initialization
	bool lockmain;  // init called runtime.LockOSThread

	Note	stopped;	// one g can set waitstop and wait here for m's to stop
};

From that particular piece of code, without a doubt the most interesting line is: G *gfree. That is a pool of the go routines that are available to be used. There are also helper schedulling functions, from which, the most interesting (for my purposes), was the static void gfput(G*); which realeases a go routine (puts it to the gfree list)

// Put on gfree list.  Sched must be locked.
static void
gfput(G *gp)
{
	gp->schedlink = runtime_sched.gfree;
	runtime_sched.gfree = gp;
}

There are loads of other extremely interesting functions there, but for the sake of space I will not expand here more. However I will expand on what it is that is confusing me:

The source of confusion

My tests in this point are to include testing if removing thread destruction from the go runtime would result in difference in behavior. There are however (as far as go is concerned), two kinds of threads in the go runtime. Goroutines (G’s) and the kernel schedulable threads (M’s).

Neither of which, seem to really be destroyed. From my understanding so far, G’s are never totally destroyed (I may be wrong here, I am still researching this bit). Whenever they are about to “destroyed”, they are added to the scheduler’s list of freeG’s to allow for reuse, as evidenced by the gfput and gfget functions. M’s on the other hand (the kernel threads), also seem to not be destroyed. A comment in go’s scheduler seems to support this (// For now, m's never go away.) and as a matter of fact I could not find any code that destroyed M’s (I am still researching this bit).

Since none of the two actually get destroyed, and seeing as thread creation alone should not be buggy, how come we are facing the specific bugs we are facing? I will try to provide with an interpretation: Either I am fairly wrong and M’s (or G’s or both) actually do get destroyed somewhere (possible and very much probable) or I looking for clues regarding the issue in the wrong place (might be possible but I don’t see it being very probable).

First of all, I would like to apologize for this report being late. But unfortunately this happened: I Accidentally 93 MB

Only that, in my case, it was not exactly 93 MB, rather it was about 1.5GB. Yeah, I accidentally obliterated my GCC repository on the Hurd, so I had to reclone and rebuild everything, something that took considerable amounts of time. How this happened is a long story that involved me wanting to rebuild my gcc, and cd-ing 2 directories above the build folder, and ending up rm -rf * from my gcc folder (that included the source, and the build folder) rather than my gcc_build folder. Thank god, that was only a minor setback, and the (small scale) crisis was soon averted.

Further research

This week was mostly spent reading source code, primarily looking for clues for the previous situation, and secondarily to get a better undestanding of the systems I am working on. This proved to be fertile, as I got a firmer grip of libpthread, and the GNU Mach system. However, while this week was mostly spent reading documentation, that doesn’t mean that I didn’t do anything practical. I also used my time to do some further research into what was it specifically that triggered the assertion failure. That required us to play a little bit with our newly built compiler on the Hurd and see what we can do with go on the Hurd.

Testing gccgo under the Hurd

If you recall correctly, the last time I reported I had found out that an assertion on libpthread`s code was failing, and that was the root cause that failed both the gccgo tests and the libgo tests. That assertion was failing at two different places in the code, the first being __pthread_create_internal which is a libpthread function located in libpthread/pthread/pt-create.c and is invoked when an application wants to create a new POSIX thread. That function of course is not getting called directly, rather it is invoked by pthread_create which is the function that user space application use to create the new thread. (For reference reasons you can find the code here)

The second place where that assertion was failing was at __sem_timedwait_internal at the file libpthread/sysdeps/generic/sem-timedwait.c, where it gets inlined in the place of self = _pthread_self ();. (For more information, checkout last week’s report).

So I was curious to test out the execution of some sample programs under the compiler we built on the Hurd. Beginning with some very simple hello world like programs, we could see that they were compiling successfully, and also ran successfully without any issues at all. Seeing as the assertion failure is generated when we attempt to create a new thread, I figured I might want to start playing with go routines under the Hurd.

So we started playing with a simple hello world like goroutine example (the one available under the tour of go on the golang.org website.)

package main

import (
    "fmt"
    "time"
)

func say(s string) {
    for i := 0; i < 5; i++ {
        time.Sleep(100 * time.Millisecond)
        fmt.Println(s)
    }
}

func main() {
    go say("world")
    say("hello")
}

This gets compiled without any issues at all, but when we try to run it…

a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted


goroutine 1 [sleep]:
time.Sleep
	../../../gcc_source/libgo/runtime/time.goc:26

goroutine 3 [sleep]:
time.Sleep
	../../../gcc_source/libgo/runtime/time.goc:26

Bam! It exploded right infront of our face. Let’s see if this might become friendlier if we alter it a little bit. To do this we removed the go from say to avoid running it as a goroutine, and we also removed time.Sleep (along with the time import), whose job is to pause a go routine.

When you do this, the code seems to be a hello world like for loop sample, that prints:

root@debian:~/Software/Experiments/go# ./a.out
world
world
world
world
world
hello
hello
hello
hello
hello

Hmm. Let’s play with it some more. Changing our code a little bit to make say("world") run as a goroutine gives us the following code:

package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    go say("world")
    say("hello")
}

Which, when executed results in this:

root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

So we can see that the simplest go programs that run with goroutines do not run. Let’s still try some programs that invoke goroutines to see if our assumptions are correct. Below is the code of a very simple web server in go (found in the golang website).

package main

import (
    "fmt"
    "net/http"
)

type Hello struct{}

func (h Hello) ServeHTTP(
    w http.ResponseWriter,
    r *http.Request) {
    fmt.Fprint(w, "Hello!")
}

func main() {
    var h Hello
    http.ListenAndServe("localhost:4000", h)
}

The (non surprising) result is the following:

a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted

goroutine 1 [syscall]:
no stack trace available

Hmm. This failure was last caused by time.Sleep. So let’s take a closer look into the code of the ListenAndServe function. The code for this function in the go runtime is this:

// ListenAndServe listens on the TCP network address srv.Addr and then
// calls Serve to handle requests on incoming connections.  If
// srv.Addr is blank, ":http" is used.
func (srv *Server) ListenAndServe() error {
	addr := srv.Addr
	if addr == "" {
		addr = ":http"
	}
	l, e := net.Listen("tcp", addr)
	if e != nil {
		return e
	}
	return srv.Serve(l)
}

This calls the function Serve. The interesting part in this one is line 1271:


 time.Sleep(tempDelay)

It calls time.Sleep on accept failure. Which is known to pause go routines, and as a result be the ultimate cause for the result we are seeing.

Final thoughts - Work for next week

So pretty much everything that has anything to do with a goroutine is failing. Richard Braun on the #hurd suggested that since creation and destruction of threads is buggy in libpthread, maybe we should try a work around until a proper fix is in place. Apart from that my mentor Thomas Schwinge suggested to make thread destruction in go’s runtime a no-op to see if that makes any difference. If it does that should mean that there is nothing wrong in the go runtime itself, rather, the offending code is in libpthread. This is also my very next course of action, which I shall report on very soon.

A clue!

So last week we were left with the compiler test logs and the build results logs that we had to go through to checkout what was the root cause of all these failures in the gccgo test results, and more importantly in the libgo tests. So I went through the gccgo logs in search for a clue about why this may have happened. Here is the list of all the failures I compiled from the logs:


spawn [open ...]^M
doubleselect.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_s      elf_ + 0), ktid); ok; })' failed.
FAIL: go.test/test/chan/doubleselect.go execution,  -O2 -g

==========================================================

spawn [open ...]^M
nonblock.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_       + 0), ktid); ok; })' failed.
FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g

==========================================================

Executing on host: /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../  -fno-diagnostics-show-caret -fdiagnostics-color=never  -I/root/gcc_new/gccbuild/i68      6-unknown-gnu0.3/./libgo  -fsplit-stack -c  -o split_stack376.o split_stack376.c    (timeout = 300)
spawn /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ -fno-diagnostics-show-caret -fdiagnostics-color=never -I/root/gcc_new/gccbuild/i686-unknown-gnu0.      3/./libgo -fsplit-stack -c -o split_stack376.o split_stack376.c^M
cc1: error: '-fsplit-stack' currently only supported on GNU/Linux^M
cc1: error: '-fsplit-stack' is not supported by this compiler configuration^M
compiler exited with status 1
output is:
 cc1: error: '-fsplit-stack' currently only supported on GNU/Linux^M
 cc1: error: '-fsplit-stack' is not supported by this compiler configuration^M 

UNTESTED: go.test/test/chan/select2.go

==========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
select3.x: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted
 
FAIL: go.test/test/chan/select3.go execution,  -O2 -g

==========================================================

Executing on host: /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ /root/gcc_new/gcc/gcc/testsuite/go.test/test/chan/select5.go  -fno-diagnostics-show-      caret -fdiagnostics-color=never  -I/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo  -O  -w  -pedantic-errors  -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -L/root/gcc_new/gccbuild/i686-unknown-      gnu0.3/./libgo/.libs  -lm   -o select5.exe    (timeout = 300)
spawn /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ /root/gcc_new/gcc/gcc/testsuite/go.test/test/chan/select5.go -fno-diagnostics-show-caret -fdiagno      stics-color=never -I/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -O -w -pedantic-errors -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.lib      s -lm -o select5.exe^M
PASS: go.test/test/chan/select5.go -O (test for excess errors)
FAIL: go.test/test/chan/select5.go execution

==========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
bug147.x: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted
 
FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g

=========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
BUG: bug347: cannot find caller
Aborted
 
 
FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g

========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
BUG: bug348: cannot find caller
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x2 addr=0x0]
 
goroutine 1 [running]:
FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g

========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
mallocfin.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self      _ + 0), ktid); ok; })' failed.
FAIL: go.test/test/mallocfin.go execution,  -O2 -g

=======================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
Aborted
 
 
FAIL: go.test/test/nil.go execution,  -O2 -g

======================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
Aborted
 
 
FAIL: go.test/test/recover3.go execution,  -O2 -g

See a pattern there? Well certainly I do. In several occasions, the root cause for the fail is this:

Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.

Hmm… That’s interesting. Let us go through the libgo results too.


Test Run By root on Fri Jul 12 17:56:44 UTC 2013
Native configuration is i686-unknown-gnu0.3

		=== libgo tests ===

a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 10005 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: bufio
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10005) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 10637 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: bytes
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10637) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 10757 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: errors
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10757) - No such process
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted


goroutine 1 [syscall]:
no stack trace available
FAIL: expvar
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10886) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11058 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: flag
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11058) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11475 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: fmt
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11475) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11584 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: html
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11584) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11747 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: image
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11747) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11999 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: io
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11999) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 12116 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: log
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (12116) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 13107 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: math
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (13107) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 13271 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: mime
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (13271) - No such process
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted


goroutine 1 [chan receive]:
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
panic during panic
testing.RunTestsFAIL: net
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (14234) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 14699 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: os
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (14699) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 14860 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: path
timed out in gotest

...


runtest completed at Fri Jul 12 18:09:07 UTC 2013

That’s certainly even more interesting. In case you haven’t noticed, it’s the same assertion that caused the failures in gccgo test suite. Let us find the offending code, shall we?

/* Set the new thread's signal mask and set the pending signals to
     empty.  POSIX says: "The signal mask shall be inherited from the
     creating thread.  The set of signals pending for the new thread
     shall be empty."  If the currnet thread is not a pthread then we
     just inherit the process' sigmask.  */
  if (__pthread_num_threads == 1)
    err = sigprocmask (0, 0, &sigset);
  else
    err = __pthread_sigstate (_pthread_self (), 0, 0, &sigset, 0);
  assert_perror (err);

This seems to be the code that the logs point to. But no sign of the assertion. After discussing this issue with my peers in #hurd, I was told that the code I was looking for (the failing assertion), is getting inlined via _pthread_self () and is actually located in libpthread/sysdeps/mach/hurd/pt-sysdep.h.

extern __thread struct __pthread *___pthread_self;
#define _pthread_self()                                            \
	({                                                         \
	  struct __pthread *thread;                                \
	                                                           \
	  assert (__pthread_threads);                              \
	  thread = ___pthread_self;                                \
	                                                           \
	  assert (thread);                                         \
	  assert (({ mach_port_t ktid = __mach_thread_self ();     \
                     int ok = thread->kernel_thread == ktid;       \
                     __mach_port_deallocate (__mach_task_self (), ktid);\
                     ok; }));                                      \
          thread;                                                  \
         })

So this is what I was looking for. Further discussing it in the weekly IRC meeting, braunr provided me with some more clues:

08:38:15 braunr> nlightnfotis: did i answer that ?
08:38:24 nlightnfotis> braunr: which one?
08:38:30 nlightnfotis> hello btw :)
08:38:33 braunr> the problems you’re seeing are the pthread resources leaks i’ve been trying to fix lately
08:38:58 braunr> they’re not only leaks
08:39:08 braunr> creation and destruction are buggy
08:39:37 nlightnfotis> I have read so in http://www.gnu.org/software/hurd/libpthread.html. I believe it’s under Thread’s Death right?
08:40:15 braunr> nlightnfotis: yes but it’s buggy
08:40:22 braunr> and the description doesn’t describe the bugs
08:41:02 nlightnfotis> so we will either have to find a temporary workaround, or better yet work on a fix, right?
08:41:12 braunr> nlightnfotis: i also told you the work around
08:41:16 braunr> nlightnfotis: create a thread pool

Work for next week

This leaves us with next week’s work, which is to hack in libpthread’s code to attempt to create a thread pool, so that we avoid some of the issues that are present now with the current implementation of the Hurd libpthread code.

It was also suggested by Samuel Thibault (youpi) that I should run the libgo tests by hand and see if I get some more clues, like stack traces. It sounds like a good idea to me, so that’s something that I will look into too.

Yeah baby! It builds!

The highlight of this week’s progress was managing to successfully build gccgo under the Hurd. Not only did it compile successfully, it also run its tests, with the results matching the ones provided by my mentor Thomas Schwinge. This was a checkpoint in my summer of code project. Successful building of the compiler meant that I am (happily) in the position to carry on with the next part (and the main one) of my project, that is, to make sure that the go library (libgo) also passes all its tests and works without any major issues.

So where are we now?

gccgo

Compiling gccgo on the Hurd was big. But we also had to see how it compared to the build that was successful on Linux. The most effective way to compare the two builds, is to check the test results of the two.

Taking a look at the gccgo results on the Hurd, I was delighted to find that it passed most of its tests. There were few that were failing, but for the most part, it did well. Below are the test results of gccgo on the Hurd:

     === go Summary ===

# of expected passes        5069
# of unexpected failures    11
# of expected failures      1
# of untested testcases     6
/root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo  version 4.9.0 20130606 (experimental) (GCC)

So it’s passing 99% of its tests. That’s cool. But it could help to take a look at the tests that are failing, to get an idea of what the fails are, how critical they are, etc

nlightnfotis@earth:~/HurdVM/HurdFiles$ grep -v ^PASS: < go.sum
Test Run By root on Thu Jul 11 10:33:34 2013
Native configuration is i686-unknown-gnu0.3

        === go tests ===

        Schedule of variations:
            unix

            Running target unix
            Running /root/gcc_new/gcc/gcc/testsuite/go.dg/dg.exp ...
            Running /root/gcc_new/gcc/gcc/testsuite/go.go-torture/execute/execute.exp ...
            Running /root/gcc_new/gcc/gcc/testsuite/go.test/go-test.exp ...
            FAIL: go.test/test/chan/doubleselect.go execution,  -O2 -g 
            FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g 
            UNTESTED: go.test/test/chan/select2.go
            FAIL: go.test/test/chan/select3.go execution,  -O2 -g 
            FAIL: go.test/test/chan/select5.go execution
            UNTESTED: go.test/test/closure.go
            FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g 
            FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g 
            FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g 
            XFAIL: bug429.go  -O2 -g  execution test
            FAIL: go.test/test/goprint.go execution
            UNTESTED: go.test/test/goprint.go compare
            UNTESTED: go.test/test/init1.go
            FAIL: go.test/test/mallocfin.go execution,  -O2 -g 
            FAIL: go.test/test/nil.go execution,  -O2 -g 
            FAIL: go.test/test/recover3.go execution,  -O2 -g 
            UNTESTED: go.test/test/rotate.go
            UNTESTED: go.test/test/stack.go

                    === go Summary ===

# of expected passes        5069
# of unexpected failures    11
# of expected failures      1
# of untested testcases     6
/root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo  version 4.9.0 20130606 (experimental) (GCC) 

Hmm. So these are the failing tests. Before we go through them, it might be a good idea to check the status of the gccgo tests on the Linux build too. Let’s see.

nlightnfotis@earth:~$ grep -v ^PASS: < linux_go.sum 
Test Run By fotis on Mon Jul 15 10:28:38 2013
Native configuration is i686-pc-linux-gnu

        === go tests ===

        Schedule of variations:
            unix

            Running target unix
            Running /home/fotis/Software/gcc/gcc/testsuite/go.dg/dg.exp ...
            Running /home/fotis/Software/gcc/gcc/testsuite/go.go-torture/execute/execute.exp ...
            Running /home/fotis/Software/gcc/gcc/testsuite/go.test/go-test.exp ...
            UNTESTED: go.test/test/closure.go
            XFAIL: bug429.go  -O2 -g  execution test
            UNTESTED: go.test/test/init1.go
            UNTESTED: go.test/test/rotate.go

                    === go Summary ===

# of expected passes        5183
# of expected failures      1
# of untested testcases     3
/home/fotis/Software/gcc_build/gcc/testsuite/go/../../gccgo  version 4.9.0 20130702 (experimental) (GCC) 

So, it seems like there are less tests failing here. But wait a minute. Those tests that are failing. They are the same as with the Hurd build. So I can assume that we are left with 4 less tests to check regarding their failures (Go on Linux works without any issues,so I guess it would be safe to skip those tests at the moment). That leaves us with these tests to check:

FAIL: go.test/test/chan/doubleselect.go execution,  -O2 -g
FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g
UNTESTED: go.test/test/chan/select2.go
FAIL: go.test/test/chan/select3.go execution,  -O2 -g
FAIL: go.test/test/chan/select5.go execution
FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g
FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g
FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g
FAIL: go.test/test/goprint.go execution
UNTESTED: go.test/test/goprint.go compare
FAIL: go.test/test/mallocfin.go execution,  -O2 -g
FAIL: go.test/test/nil.go execution,  -O2 -g
FAIL: go.test/test/recover3.go execution,  -O2 -g
UNTESTED: go.test/test/stack.go

Discussing this with my mentor Thomas Schwinge in IRC (#hurd)

For now, please ignore any failing tests that have »select« in their name -- that is, do file them, but do not spend a lot of time figuring out what might be wrong there. The Hurd's select implementation is a bit of a beast, and I don't want you -- at this time -- spend a lot of time on that. We already know there are some deficiencies, so we should postpone that to later.

So that leaves us with even less tests to check:

FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g
FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g
FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g
FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g
FAIL: go.test/test/goprint.go execution
UNTESTED: go.test/test/goprint.go compare
FAIL: go.test/test/mallocfin.go execution,  -O2 -g
FAIL: go.test/test/nil.go execution,  -O2 -g
FAIL: go.test/test/recover3.go execution,  -O2 -g
UNTESTED: go.test/test/stack.go

Nice. This narrowed down the list of errors that I have to go through to make sure that gccgo works as well on the Hurd as it does on Linux.

libgo

So, we talked about gccgo, but what about the runtime libraries (libgo)? They are also getting tested when we run make check-goand seeing as they are a vital part of enabling programs written on go to run on the Hurd, we ought to take a look. (This was also the original goal of my project proposal).

So let us see what we have at the libgo.sum:

Test Run By root on Fri Jul 12 17:56:44 UTC 2013
Native configuration is i686-unknown-gnu0.3

        === libgo tests ===

        Schedule of variations:
            unix

            Running target unix
            Running ../../../gcc/libgo/libgo.exp ...
            FAIL: bufio
            FAIL: bytes
            FAIL: errors
            FAIL: expvar
            FAIL: flag
            FAIL: fmt
            FAIL: html
            FAIL: image
            FAIL: io
            FAIL: log
            FAIL: math
            FAIL: mime
            FAIL: net
            FAIL: os
            FAIL: path
            FAIL: reflect
            FAIL: regexp
            FAIL: runtime
            FAIL: sort
            FAIL: strconv
            FAIL: strings
            FAIL: sync
            FAIL: syscall
            FAIL: time
            FAIL: unicode
            FAIL: archive/tar
            FAIL: archive/zip
            FAIL: compress/bzip2
            FAIL: compress/flate
            FAIL: compress/gzip
            FAIL: compress/lzw
            FAIL: compress/zlib
            FAIL: container/heap
            FAIL: container/list
            FAIL: container/ring
            FAIL: crypto/aes
            FAIL: crypto/cipher
            FAIL: crypto/des
            FAIL: crypto/dsa
            FAIL: crypto/ecdsa
            FAIL: crypto/elliptic
            FAIL: crypto/hmac
            FAIL: crypto/md5
            FAIL: crypto/rand
            FAIL: crypto/rc4
            FAIL: crypto/rsa
            FAIL: crypto/sha1
            FAIL: crypto/sha256
            FAIL: crypto/sha512
            FAIL: crypto/subtle
            FAIL: crypto/tls
            FAIL: crypto/x509
            FAIL: database/sql
            FAIL: database/sql/driver
            FAIL: debug/dwarf
            FAIL: debug/elf
            FAIL: debug/macho
            FAIL: debug/pe
            FAIL: encoding/ascii85
            FAIL: encoding/asn1
            FAIL: encoding/base32
            FAIL: encoding/base64
            FAIL: encoding/binary
            FAIL: encoding/csv
            FAIL: encoding/gob
            FAIL: encoding/hex
            FAIL: encoding/json
            FAIL: encoding/pem
            PASS: encoding/xml
            FAIL: exp/cookiejar
            FAIL: exp/ebnf
            FAIL: exp/html
            FAIL: exp/html/atom
            FAIL: exp/locale/collate
            FAIL: exp/locale/collate/build
            FAIL: exp/norm
            FAIL: exp/proxy
            FAIL: exp/terminal
            FAIL: exp/utf8string
            FAIL: html/template
            FAIL: go/ast
            FAIL: go/doc
            FAIL: go/format
            FAIL: go/parser
            FAIL: go/printer
            FAIL: go/scanner
            FAIL: go/token
            FAIL: go/types
            FAIL: hash/adler32
            FAIL: hash/crc32
            FAIL: hash/crc64
            FAIL: hash/fnv
            FAIL: image/color
            FAIL: image/draw
            FAIL: image/jpeg
            FAIL: image/png
            FAIL: index/suffixarray
            FAIL: io/ioutil
            FAIL: log/syslog
            FAIL: math/big
            FAIL: math/cmplx
            FAIL: math/rand
            FAIL: mime/multipart
            FAIL: net/http
            FAIL: net/http/cgi
            FAIL: net/http/fcgi
            FAIL: net/http/httptest
            FAIL: net/http/httputil
            FAIL: net/mail
            FAIL: net/rpc
            FAIL: net/smtp
            FAIL: net/textproto
            FAIL: net/url
            FAIL: net/rpc/jsonrpc
            FAIL: old/netchan
            FAIL: old/regexp
            FAIL: old/template
            FAIL: os/exec
            FAIL: os/signal
            FAIL: os/user
            FAIL: path/filepath
            FAIL: regexp/syntax
            FAIL: runtime/pprof
            FAIL: sync/atomic
            FAIL: text/scanner
            FAIL: text/tabwriter
            FAIL: text/template
            FAIL: text/template/parse
            FAIL: testing/quick
            FAIL: unicode/utf16
            FAIL: unicode/utf8

                    === libgo Summary ===

# of expected passes        1
# of unexpected failures    130
/root/gcc_new/gccbuild/./gcc/gccgo version 4.9.0 20130606 (experimental) (GCC)

Oh boy! Oh boy! Well, on second thoughts, this was not unexpected. This was the core of my GSOC work. This is how it starts :)

Before this goes any further, maybe we should visit the Linux test results too.


Test Run By fotis on Τρι 02 Ιούλ 2013 09:20:20 μμ EEST
Native configuration is i686-pc-linux-gnu

        === libgo tests ===

        Schedule of variations:
            unix

            Running target unix
            Running ../../../gcc/libgo/libgo.exp ...
            PASS: bufio
            PASS: bytes
            ...

                    === libgo Summary ===

# of expected passes        131
/home/fotis/Software/gcc_build/./gcc/gccgo version 4.9.0 20130702 (experimental) (GCC)

Wow. Considering the results from the Hurd, they really are not unexpected. Remember that getcontext, makecontext, setcontext and swapcontext are not working as expected.

And recalling from an email from Ian Lance Taylor (the GCCgo maintainer, and a member of the Go team) early in the summer:

Go does require switching stacks. A port of Go that doesn’t support goroutines would be useless–nothing in the standard library would work

Conclusion / Work for next week.

So now it comes down to work on implementing correctly the context switching functions. Apart from that, going through the test results that fail from gccgo is also something that is to be done, however I am not sure that it should be a first priority. I also have to go through go.log to see if there any clues as to why the gccgo tests fail.

Having finally built gccgo on the Hurd, and more importantly still being on schedule, (the original one, from my proposal) means that I can now concentrate on the core part of my project proposal (and the most exciting one too), that is proper implementation of what is blocking effective context switching, which in its part is blocking goroutines, without which, the go library will not work properly.

A new beginning

Oh boy! A new start. Isn’t that exciting? You bet it is. It’s not however my first introduction to blogging. I used to have a blog on Wordpress.com, but after a while I was turned off by the fact that it felt too limited. So I decided I wanted a new place for me to host my online presence, that wasn’t so much limited as Wordpress was. Initially I was thinking about renting a VPS and self hosting wordpress there. But after doing a little research, I came across Github Pages. I started investigating Github pages some more, and found jekyll to be very very interesting. After a while, I also came across Octopress. That was it. I was sold :)

Free hosting of a blog, no need to maintain a server, and a platform written in ruby?

Without hesitation, I immediately started working on it. I went through octopress documentation, (which needless to say, but it was fantastic) found a wonderful theme online at opthemes, (kudos to Alex Garibay for that) and got started.

And here we are. On a platform that you can hack and customize to your liking - at least more so than the locked down version of wordpress. On a platform that is written on a language that I don’t hate with passion (like cough, php, cough), and may actually learn in the future, just for the sake of being able to customize every bit of it (gotta love the hacker’s way) - even though I am not really interested in web development per se.

Hope it starts out nice. I guess that is left to be seen.