Before we get any further, it might be good if we provided some context.

Hello world. Again.

#include <stdio.h>

int
main (int argc, char **argv)
{
    printf ("Hello world!\n");

    return 0;
}

Every user space (read: application) programmer, has written a hello world program. Only god knows how many times this program has been written. Yet, most programmers’ knowledge of the program is limited to something along the lines of:

  • It sends the string passed as a parameter to the system to print.
  • It takes the printf function from stdio.h and prints the string

and various other things, which are anywhere between plain wrong, or partially correct.

** So why not demistify the process? **

Enter the C preprocessor.

You may have heard of the C Preprocessor. It’s the first stage of a c or c++ file compilation, and it’s actually responsible for things such as:

  • inclusion of header files (it does so by replacing #include <header.h> with the content of this file, and the file it includes recursively),
  • macro expansion, such as the famous comparison of two numbers (a greater than b). In essence, if you define the following macro #define gt(a, b) ((a > b) ? 1 : 0), then in a statement such as this:
 if (gt (5, 3)) printf ("The first parameter is greater than the second.\n");

gt (5, 3) gets expanded to the macro definition, so after the preprocessor has run you end up with something like this:

 if (((5 > 3) ? 1 : 0)) printf ("The first parameter is greater than the second.\n");
  • conditional compilation (things such as:
#ifdef WIN32 
    printf ("We are on windows\n"); 
#endif

amongst others. You can see it for yourself. Write the hello world program, and pass it to cpp: cpp hello_world.c

So now that we know what it does it’s time to demistify a common myth regarding it: Some people believe that the header files include the function to be called.. That’s wrong. What it does include is function prototypes (and some type definitions, etc) only. It doesn’t include the body of the function to be called.

Some people find that fact quite surprising, though, it isn’t, if you get to understand what the compiler does with it.

Say hello to the compiler.

Here we are gonna unmask another pile of misconceptions. First of all, some people think that when they call gcc on the command line they are actually calling the compiler. They are not. In fact they are calling the software commonly called the compilation driver, whose job is to run all the software needed to fully turn source to binary, including preprocessors, the actual compiler, an assembler and finally the linker

Having said that, the actual compiler that’s getting called when you call gcc is called cc1. You may have seen it some times when the driver reports errors. Wanna take a look at it, to make sure I’m not lying to you? (Hint: I’m not!) Fair enough. Why don’t you type this in the command line: gcc -print-prog-name=cc1. It should tell you where the actual compiler is located in your system.

So now that we have this (misconception) out of our minds, we can continue with our analysis. Last time we talked about it, we said that the header files include prototypes and not the whole function.

You may know that in C, you usually declare a function, before you use it. The primary reason for doing this is to provide the compiler with the ability to perform type checking, that is to check that the arguments passed are correct, both in number, and in type, and to verify that the returned value (assuming there is one) is being used correctly. Below is a program that demonstrates the function prototype:

#include <stdio.h>

int add_nums (int first, int second);

int
main (void)
{
    printf ("5 + 5 results in %d\n", add_nums (5, 5));

    return 0;
}

int
add_nums (int first, int second)
{
    return first + second;
}

In this particular example, the prototype gives the compiler a wide variety of information. It tells it that function add_nums takes two int arguments and returns an integer to the calling function. Now the compiler can verify that I am passing correct arguments to it when I call it inside printf. If I don’t include the function prototype, and do something slightly evil such as calling add_nums with float arguments then this might happen:

5 + 4 results in 2054324224

Now that you know that the compiler (the real one) only needs the prototype and not the actual function code, you may be wondering how the compiler actually compiles it if it doesn’t know it’s code.

Now is the time to bring down another missconception. The word compiler is just a fancy name for software otherwise known as translators. A translator’s job is to get input and turn it from one language (source language) to a second language (target language), whatever that may be. Most of the times, when you compile software, you compile it to run in your computer, which runs on a processor from the x86 architecture family of processors. A processor is typically associated with an assembly language for that architecture (which is just human friendly mnemonics for common processor tasks), so your x86 computer runs x86 assembly (ok that’s not 100% true, but for simplicity’s sake at the moment, it should serve. We will see why it’s not true later.) So the compiler (in a typical translation) translates (compiles) your C source code to x86 assembly. You can see this by compiling your hello world example and passing the compiler the -S (which asks it to stop, after x86 assembly is produced) parameter, likewise gcc -S hello.c.

Conclusion

At this part, we saw how the compiler and the preprocessor work with our code, in an attempt to demistify the so called library calls. In the next part, we are going to study the assembler and the linker, and for the final part the loader and the kernel.

Introduction

This week was spent investigating the runtime and debugging executables with gdb. It was interesting in the sense that it provided me with some interesting pieces of information. Without any further ado, let’s present our findings:

My findings

Before starting out playing with libpthread, and glibc, I wanted to make sure that the goruntime behaved the way I believed it behaved, and make some further assurances about the goruntime. These assurances had to do with the total number of goroutines and the total number of machine threads at various checkpoints in the language runtime.

  • The first thread in the program is initialised during runtime_schedinit.
  • The number of m’s (kernel threads) is dependent on the number of goroutines. The runtime basically attempts to create an equal amount of m’s to run the goroutines. We can observe everytime a new goroutine is created, there is a number of calls to initiate a new kernel thread.
  • There are at least two kernel threads. One that supports the runtime (mainly the garbage collector) and one that executes the code of the go program.

There is only one small piece of code in the goruntime that creates some sort of confusion for me, and that is the code for a new m initialisation. Let me first present the code that confuses me:

M*
runtime_newm(void)
{

    ...
	mp = runtime_mal(sizeof *mp);

    ...
	mcommoninit(mp);
	mp->g0 = runtime_malg(-1, nil, nil);

    ...
	if(pthread_attr_init(&attr) != 0)
		runtime_throw("pthread_attr_init");
	if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
		runtime_throw("pthread_attr_setdetachstate");

    ...
}

I purposely compacted the function for brevity, as it only serves as a demonstration for a point. Now, my confusion lies in the line mp->g0 = runtime_malg(-1, nil, nil). It is a piece of code that allocates memory for a new goroutine. Now I am ok with that, but what I do not understand is that new kernel threads (m’s) are supposed to be pick and run a goroutine from the global goroutine pool - that is run an existing one, and not create a new one. Now, the runtime_malg is given parameters that don’t initialise a new goroutine properly, but still, new memory is allocated for a new goroutine, and is returned to mp->g0 from runtime_malg.

Assuming I have not misunderstood something, and I am not mistaken (which is kind of likely), this is behavior that could lead to a number of questions and/or problems. For instance, what happens to the goroutine created by runtime_malg? Is it killed after the m is assigned a new goroutine to execute? Is it parked on the goroutine global list? Is it just ignored? Does it affect the runtime scheduler’s goroutine count? This is the last thing I feel I wanna clear out regarding gccgo’s runtime.

gdb

For this week, I also run the executables created by gccgo through gdb. It was a fertile attempt that, most of the time, confirmed my findings in the goruntime. It also provided us with some other nice pieces of information regarding the crashing of goroutines, but also left me with a question.

The code in question that I run through gdb is this:

package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    fmt.Println("[!!] right before a go statement")
    go say("world")
    say ("hello")
}

Your very typical hello world like goroutine program. Now, setting a break point in main (not the program’s main, that’s main.main. main as far as the runtime is concerned is the runtime entry point, in go-main.c) and running it through gdb yields the following results:

Breakpoint 1, main () at ../../../gcc_source/libgo/runtime/go-main.c:52
52 runtime_check ();
2:  __pthread_total = 1
1: runtime_sched.mcount = 0
(gdb) next
53 runtime_args (argc, (byte **) argv);
2: __pthread_total = 1
1: runtime_sched.mcount = 0
54 runtime_osinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 0
63: runtime_schedinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 1

Up until now, nothing unexpected. The kernel thread is registered with the runtime scheduler during its initialisation process in runtime_schedinit and that’ why the runtime_sched.mcount is reported to be zero many times before schedinit is run.

68 __go_go (mainstart, NULL);
2: __pthread_total = 1
1: runtime_sched.mcount = 1
(gdb) display runtime_sched.gcount
3: runtime_sched.gcount = 0

That too is ok, because a new goroutine is registered with the scheduler during the call to __go_go. Now I am gonna fast forward a bit, to a more interesting point.

...
[DEBUG] (in runtime_gogo) new goroutine's status is 2
[DEBUG] (in runtime_gogo) number of goroutines now is 2
[New Thread 629.30]

Program received SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) info threads
 Id   Target  Id       Frame
 6    Thread  629.30   0x08048eb7 in main.main () at goroutine.go:12
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
```
This is getting weird. I mean, libpthread is reporting that 2 threads are active,
but gdb reports that 3 are active. Anyway, let's continue:

```
[DEBUG] (in runtime_stoptheworld) stopped the garbage collector
[DEBUG] (in runtime_starttheworld) starting the garbage collector
[DEBUG] (in runtime_starttheworld) number of m's now is: 2
[DEBUG] (in runtime_starttheworld) [note] there is already one gc thread
[!!] right before a go statement

Program received signal SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) continue
... (output omitted by me for brevity)

[DEBUG] (in runtime_newm) Right before the call to pthread_create.
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
[New Thread 629.31]

Program received signal SIGABRT, Aborted.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3

Oh my goodness. From a first glance, this seems to be a very serious inconsistency between libpthread and the goruntime. At this point, the go scheduler reports 3 threads (3 registered threads, that means that flow of execution has passed mcommoninit, the kernel thread initialisation function which also registers the kernel thread with the runtime_scheduler) whereas libpthread reports 2 threads.

But WAIT! Where are you going? Things are about to get even more interesting!

(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

GDB reports 4 threads. Yes, 4 threads ladies and gentlemen. Now take a look closely. 3 threads are in the same frame, with the one with id 4 being the one currently executed. And there is also a pattern. 0x01da48ec is the value of the eip register for all 3 of them.

That’s one thing that is for certain. Now I already have an idea. Why not change the current thread to the one with id 7? I’m sold to the idea, let’s do this:

(gdb) thread 7
[Switching to thread 7 (Thread 629.31)]
#0  0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
(gdb) continue
Continuing.

Program received signal SIGABRT, Aborted.
[Switching to Thread 629.28]
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3
(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01dc08b0 in ?? () from /lib/i386-gnu/libc.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

Damn. But I am curious. What’s the next value to be executed?

(gdb) x/i $eip
=> 0x1da48ec: ret

And what is the next value to be executed for the thread with id 7?

(gdb) x/i $eip
=> 0x1dc08b0: call *%edx

Conclusion

Apparently, there is still much debugging left to checkout what is really happening. But we have got some leads in the right direction, that hopefully will lead us to finally finding out where the problem lies, and correct it.

Most importantly, in my immediate plans, before iI start playing around with libpthread is to attempt the same debugging run on the same code, under linux (x86). Seeing as go is clean on linux, it would provide some clues as to what the expected results should be, and where the execution differentiates substantially, a clue that might be vital to finding the problem.

Introduction

This week was spent attempting to debug the gccgo runtime via print statements. There were many things that I gained from this endeavour. The most significant of which, is the fact that I have got a great deal of information regarding the bootstrapping of a go process. Let’s proceed into presenting this week’s findings, shall we?

Findings

The process bootstrapping sequence

The code that begins a new go-process is conveniently located in a file called go-main.c, the most significant part of which is the following:

int
main (int argc, char **argv)
{
  runtime_check ();
  runtime_args (argc, (byte **) argv);
  runtime_osinit ();
  runtime_schedinit ();
  __go_go (mainstart, NULL);
  runtime_mstart (runtime_m ());
  abort ();
}

static void
mainstart (void *arg __attribute__ ((unused)))
{
  runtime_main ();
}

The process is as follows:

  • First runtime_check runs and registers the os_Args and syscall_Envs as runtime_roots with the garbage collector. I am still investigating what this function exactly is doing, but it seems like some early initialisation of the garbage collector
  • Secondly, runtime_args is run. It’s job is to call a specific argument handler for the arguments passed to main.
  • Thirdly, runtime_osinit is run, whose job is to call the lowlevel _CPU_COUNT function, to get the number of CPUs (in a specific data structure that represents a set of CPUs)
  • After that, runtime_schedinit is run, whose job is to create the very first goroutine (g) and system thread (m), and continues with parsing the command line arguments, and the environment variables. After that it sets the maximum number of cpus that are to be used (via GOMAXPROCS), runs the first goroutine, and does some last pieces of the scheduler’s initialisation.
  • Following runtime_schedinit, __go_go is run, a function whose purpose is to create a new queue, tell it to execute the function that is passed to it as the first parameter, and then queue the goroutine in the global ready-to-run goroutine pool.
  • Last but not least, runtime_mstart runs, which seems to be starting te execution of the kernel thread created during runtime_schedinit.

The very last piece of code that is run (and most probably the most important) is runtime_main. Remember that this is passed as a parameter to a goroutine created during the __go_go call, and its job is to mark the goroutine that called it as the main os thread, to initialise the sceduler, and create a goroutine whose job is to release unused memory (from the heap) back to the OS. It then starts executing the process user defined instructions (the code the programmer run) via a call to a macro that directs it to __go_init_main in the assembly generated by the compiler.

Runtime_main is also the function that terminates the execution of a go process, with a call to runtime_exit which seems to be a macro to the exit function.

Other findings

During our debugging sessions we found out that the total count of kernel threads that are running in a simple program is at least two. The first one is the bootstrap M, (the one initialised during the program’s initialisation, inside runtime_schedinit) and at least another one, (I am still invistigating the validity of the following claim) created to be used by the garbage collector.

A simple go program such as one doing arithmetic or printing a helloworld like message evidently has no issue running. The issues arrise when we use a go statement. With all our debugging messages activated, this is how a simple go program flows:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] (in main) before runtime_mcheck is run
[DEBUG] (in main) before runtime_args is run
[DEBUG] (in main) before runtime_osinit is run
[DEBUG] (in main) before runtime_schedinit is run
[DEBUG] (in main) before runtime_mstart is run
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (in mainstart) right before the call to runtime_main
[DEBUG] (in runtime_main) Beginning of runtime_main
[DEBUG] (start of runtime_newm) Total number of m's is 1
[DEBUG] (in runtime_newm) Preparing to create a new thread
[DEBUG] (in runtime_newm) Right before the call to pthread_create
[DEBUG] (in runtime_newm) pthread_create returned 0
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (end of runtime_newm) Total number of m's is 2
Hello, fotis
[DEBUG] (in runtime_main) Right before runtime_exit

And this is how a goroutine powered program fails:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] (in main) before runtime_mcheck is run
[DEBUG] (in main) before runtime_args is run
[DEBUG] (in main) before runtime_osinit is run
[DEBUG] (in main) before runtime_schedinit is run
[DEBUG] (in main) before runtime_mstart is run
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (in mainstart) right before the call to runtime_main
[DEBUG] (in runtime_main) Beginning of runtime_main
[DEBUG] (start of runtime_newm) Total number of m's is 1
[DEBUG] (in runtime_newm) Preparing to create a new thread
[DEBUG] (in runtime_newm) Right before the call to pthread_create
[DEBUG] (in runtime_newm) pthread_create returned 0
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (end of runtime_newm) Total number of m's is 2
[DEBUG] (start of runtime_new) Total number of m's is 2
[DEBUG] (in runtime_newm) Preparing to create a new thread.
[DEBUG] (in runtime_newm) Right before the call to pthread_create
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

Work for the next week

I will of course continue to print debug until I have knowledge of the exact flow of execution in the go runtime. Right now I have very good knowledge of the flow, but there are some things that I need to sort out. For instance it is not exactly clear to me why we call certain functions, or what they are supposed to be doing at certain parts. After I sort this out, I also plan to start debugging the libpthread to see what’s libpthreads status during a hello world like program, and during a goroutine powered program, to get to see if we get to find something interesting in libpthread (like how many threads does libpthread report against how many the goruntime reports)

This week was revolving around the print debugging in the gccgo runtime in search for clues regarding the creation of new threads under the goruntime, so as to see if there is something wrong with the runtime itself, or the way the runtime interacts with the libpthread.

(partial presentation of) findings

During print debugging the gccgo runtime, I didn’t notice anything abnormal or unusual so far. For example, the code that does trigger the assertion failure seems to work at least once, since pthread_create() returns 0 at least once.

This is expected behavior, since we already have stated that there is at least one M (kernel thread) created at the initialisation of the program’s runtime.

If however, we try to use a go statement in our program, to make usage of a goroutine, the runtime still fails at the usual assertion fail, however the output of the program is this:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] pthread_create returned 0
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

The above output can give us some pieces of information:

  • pthread_create() is called at least once.
  • it executes successfuly and without errors - libpthread code suggests that 0 is returned upon successful execution and creation of a thread
  • However the assertion is still triggered, which we know it’s getting triggered during thread creation.

The second bullet point is also being supported by the fact that even if you exe cute something as simple as hello world in go, a new M is created, so you get something along the lines of this as an output:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] pthread_create returned 0
Hello World!
root@debian:~/Software/Experiments/go#

There is however something that the above piece of code doesn’t tell us, but it would be useful to know: How many times did we create a new thread? So we modify our gcc’s source code to see how many times the runtimes attempts to create a new kernel thread (M). This is what we get out of it:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] Preparing to create a new thread.
[DEBUG] pthread_create returned 0
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
[DEBUG] Preparing to create a new thread.
aborted.

The code at this point in the runtime is this:

// Create a new m.  It will start off with a call to runtime_mstart.
M*
runtime_newm(void)
{
	M *mp;
	pthread_attr_t attr;
	pthread_t tid;
	size_t stacksize;
	sigset_t clear;
	sigset_t old;
	int ret;

#if 0
	static const Type *mtype;  // The Go type M
	if(mtype == nil) {
		Eface e;
		runtime_gc_m_ptr(&e);
		mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
	}
#endif

	// XXX: Added by fotis for print debugging.
	printf("[DEBUG] Preparing to create a new thread.\n")

	mp = runtime_mal(sizeof *mp);
	mcommoninit(mp);
	mp->g0 = runtime_malg(-1, nil, nil);

	if(pthread_attr_init(&attr) != 0)
		runtime_throw("pthread_attr_init");
	if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
		runtime_throw("pthread_attr_setdetachstate");

	// <http://www.gnu.org/software/hurd/open_issues/libpthread_set_stack_size.html>
#ifdef __GNU__
	stacksize = StackMin;
#else
	stacksize = PTHREAD_STACK_MIN;

	// With glibc before version 2.16 the static TLS size is taken
	// out of the stack size, and we get an error or a crash if
	// there is not enough stack space left.  Add it back in if we
	// can, in case the program uses a lot of TLS space.  FIXME:
	// This can be disabled in glibc 2.16 and later, if the bug is
	// indeed fixed then.
	stacksize += tlssize;
#endif

	if(pthread_attr_setstacksize(&attr, stacksize) != 0)
		runtime_throw("pthread_attr_setstacksize");

	// Block signals during pthread_create so that the new thread
	// starts with signals disabled.  It will enable them in minit.
	sigfillset(&clear);

#ifdef SIGTRAP
	// Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
	sigdelset(&clear, SIGTRAP);
#endif

	sigemptyset(&old);
	sigprocmask(SIG_BLOCK, &clear, &old);
	ret = pthread_create(&tid, &attr, runtime_mstart, mp);

	/* XXX: added for debug printing */
	printf("[DEBUG] pthread_create() returned %d\n", ret);

	sigprocmask(SIG_SETMASK, &old, nil);

	if (ret != 0)
		runtime_throw("pthread_create");

	return mp;
}

We can deduce two things about our situation right now:

  • There is at least one thread successfully created, and there is an attempt to create another one.
  • The second time, there is a failure before pthread_create is called.

Continuation of work.

I have been following this course of path the last week. I presented some of my findings, and hope to soon be able to write an exhaustive report on what exactly it is that causes the bug.

This week was spent studying the go language’s runtime and studying the behaviour of various go programs when executed under the Hurd. I learnt a variety of new things, and got some new clues about the problem.

The new libgo clues

I already know that M’s are the “real” kernel schedulable threads and G’s are the go runtime managed ones (goroutines). Last time I had gone through the go runtime’s code I had noticed that neither of them get created, so there must be an issue with thread creation. But since there is at least one of each created during the program’s initialization, how come most programs are able to run, and issues present themselves when we manually attempt to run a goroutine?

I will admit that the situation looks strange. So I decided to look more into it. Before we go any further, I have to embed the issues I had when I run goroutine powered programs under the Hurd.

root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

__pthread_create_internal is a libpthread function that gets called when a new posix thread is instanciated. So we know that when we call a goroutine, apart from the goroutine, there is at least one kernel thread created, otherwise, if a new goroutine was created, and not a new kernel thread (M) why wasn’t it matched with an existing kernel thread (remember there is at least one).

That made me look into the go runtime some more. I found a lot of things, that I can not enumerate here, but amongst the most interesting ones, was the following piece of code:

// Create a new m.  It will start off with a call to runtime_mstart.
M*
runtime_newm(void)
{
	M *mp;
	pthread_attr_t attr;
	pthread_t tid;
	size_t stacksize;
	sigset_t clear;
	sigset_t old;
	int ret;

#if 0
	static const Type *mtype;  // The Go type M
	if(mtype == nil) {
		Eface e;
		runtime_gc_m_ptr(&e);
		mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
	}
#endif

	mp = runtime_mal(sizeof *mp);
	mcommoninit(mp);
	mp->g0 = runtime_malg(-1, nil, nil);

	if(pthread_attr_init(&attr) != 0)
		runtime_throw("pthread_attr_init");
	if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
		runtime_throw("pthread_attr_setdetachstate");

	stacksize = PTHREAD_STACK_MIN;

	// With glibc before version 2.16 the static TLS size is taken
	// out of the stack size, and we get an error or a crash if
	// there is not enough stack space left.  Add it back in if we
	// can, in case the program uses a lot of TLS space.  FIXME:
	// This can be disabled in glibc 2.16 and later, if the bug is
	// indeed fixed then.
	stacksize += tlssize;

	if(pthread_attr_setstacksize(&attr, stacksize) != 0)
		runtime_throw("pthread_attr_setstacksize");

	// Block signals during pthread_create so that the new thread
	// starts with signals disabled.  It will enable them in minit.
	sigfillset(&clear);

#ifdef SIGTRAP
	// Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
	sigdelset(&clear, SIGTRAP);
#endif

	sigemptyset(&old);
	sigprocmask(SIG_BLOCK, &clear, &old);
	ret = pthread_create(&tid, &attr, runtime_mstart, mp);
	sigprocmask(SIG_SETMASK, &old, nil);

	if (ret != 0)
		runtime_throw("pthread_create");

	return mp;
}

This is the code that creates a new kernel thread. Notice the line ret = pthread_create(&tid, &attr, runtime_mstart, mp);. It’s obvious that it creates a new kernel thread, so that explains why we get the specific error. But what is not explained is that since we do have at least one in program startup, why is this specific error only triggered when we manually create a go routine?

Go programs under the Hurd

Apart from studying Go’s runtime source code, I also run some experiments under the Hurd. I got some very weird results that I am investigating, but I would like to share nonetheless. Consider the following piece of code:

package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    say("world")
    say("hello")
}

A very basic example that can demonstrate goroutines. Now, if we change one of the say functions inside main to a goroutine, this happens:

root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

BUT if we change BOTH of these functions to goroutines (go say("world"), go say("hello")), this happens:

root@debian:~/Software/Experiments/go# ./a.out
root@debian:~/Software/Experiments/go#

Wait a minute. It can’t be! Did it execute correctly? Where is the output?

root@debian:~/Software/Experiments/go# echo $?
0
root@debian:~/Software/Experiments/go#

It reports that it has executed correctly. But there is no output.

What I am doing next

I will continue reading through the go runtime for some clues. On the more active size, I am writing a custom test case for goroutine testing under the Hurd, while also doing some analysis on the programs that run there (currently studying the assembly generated for these programs) to see how they differ and why we get this particular behavior.