Hi. I’m Fotis Koutoulakis.

I spend my time pretending I know about computers…

My Linux from Scratch Experience

The past two to three days, I have been busy with creating my very own Linux distribution using the well known Linux from Scratch. This post is an accounting of my experience with the process, what I liked, what I did learn from that, what was surprising to me and more.

Linux from Scratch: An introduction

If you are here, then you most likely already know what linux from scratch is, but for the sake of completeness (or in the case that you don’t know what it is, but are so keen on learning) I will provide an introduction about it here.

Linux from scratch is a book (from now on, lfs), providing a series of steps that guide you to the creation of a fully function GNU/Linux distribution. Although the original book creates a “barebones” distribution, with only fundamental tools in it, the distribution created provides a fine enviroment for further experimentation or customization.

Apart from the basic book, the lfs project also has 3-4 books to read if you want to extend the basic system (such as blfs, Beyond Linux from Scratch) or if you want to automate the process, create a distribution that is more secure, or how to cross-compile an lfs system for different machines.

My experience with building LFS

A small introduction about my background

I have been a UNIX (-like) systems (full-time) user for about 2.5 years now. During that time I had seen myself from being what you would call a Linux newbie, not knowing how to use a system without a GUI installed (have I mentioned that Ubuntu was my favourite distribution) to being an arguably experienced UNIX programmer, trying to learn more about the various aspects of UNIX systems, and delving deeper and deeper into them every day (while also feeling pain if using something other than a UNIX like system).

During that time, I have learned about the Unix way of working with the system, using the shell and the system’s toolchain to write software and other wise manipulate the system. I ditched my old knowledge about IDEs and GUIs, and set out to master the command line and the associated tools (Anecdote: I remember, when I first came from to Unix from Windows, to searching the net for a C/C++ IDE to do development.) I remember reading about how people worked another way in Unix land, using an editor, and the shell to work, and I decided to force myself to learn to work that way. I still remember trying to use vim and gcc, and ending up liking this way better because it seemed a more natural way to interact with the software development process, than using a ide and pressing the equivalent of a “play” button, so that magic ensues for the next few seconds until I have a result.

Time has passed since then, and going through hours and hours of reading and working with the system, I did learn quite a lot about it. My Google Summer of Code experience in 2013 expanded my system knowledge even further (that’s what you get when you have to work with the system kernel, the C library and a compiler).

But in all that time, of using Unix like systems, I never had the chance to create one myself. And although my background did allow me to know quite a few things of the inner workings of a system like that, I never actually saw all these software systems combining in front of my very eyes to create that beauty we know as a GNU/Linux distribution. And that left me a bad taste, because I knew what was happening, but I wanted to see it happen right in front of my eyes.

Knowing about the existence of lfs, and not actually going through it also made matters worse for me, as I knew that I could actually “patch” that knowledge gap of mine, but I never really tried to do that. I felt that I was missing on a lot, and that lfs would be instrumental to my understanding of a Linux system. Having gone through that some years ago, and getting stuck at the very beginning had also created an innate fear in me, that it was something that would be above my own powers.

Until two days ago, when I said to myself: “You know what? I have seen and done a lot of things in a UNIX system. I am now much more experienced than I was when I last did it. And I know I want to at least try it, even if it will only give me nothing but infinite confusion Because if I do manage to get it, I will learn so many more things, or at least get assured that my preexisting knowledge was correct” And that thought was the greatest motive I had to do that in a fairly long time.

So, I sat at my desk, grabbed a cup of coffee and off I went!

The process

Preparation and the temporary toolchain

The book is itself several chapters long, each of which perform another “big step” in the creation of the distribution.

The first few chapters are preparatory chapters, where you ensure the integrity of the building environment, and download any building dependencies you may be lacking, create a new partition that will host the lfs system, and create the user account that will do the building of the temporary toolchain.

The temporary toolchain building is a more exciting process. In essence you compile and collect several pieces of software that will later be used to compile the distribution’s toolchain and other software.

You start of with building binutils, and that is to get a working assembler and linker. After having a working assembler and linker, you proceed with compiling gcc. Next on is unpacking the linux headers, so that you can compile (and link against them) the glibc.

Having the basic parts of the toolchain compiled, you then proceed with installing other software that is needed in the temporary toolchain, like gawk, file, patch, perl etc.

Building the main system

After getting done with the temporary toolchain, you then chroot into the lfs partition. You start of with creating the needed directories (like /bin, /boot, /etc, /home etc) and then continue with building the distribution software, utilising the temporary toolchain. For instance, you construct a new gcc, you compile sed, grep, bzip, the shadow utility that manages the handling of passwords etc, all while making sure that things don’t break, and running countless tests (that sometimes take longer than what the package took to compile) to ensure that what you build is functional and reliable.

Final configuration

Next one on the list, is the various configuration files that reside in /etc, and the setup of sysvinit, the distribution’s init system.

Last, but not least, you are compiling the linux kernel and setting up grub so that the system is bootable.

At this point, if all has gone well, and you reset, you should boot into your new lfs system.

What did I gain from that?

Building lfs was a very time consuming process for me. It must have taken about 7-8 hours at the very least. Not so much because of the compilation and testing (I was compiling with MAKEFLAGS='-j 4' on a Core i5), but because I was didn’t complete some steps correctly, and later needed to go back and redo them, along with everything that followed, along with the time it took to research some issues, programs or various other things before I did issue a command at the shell.

Now if I were to answer the question “What did I gain from that”, my answer would be along the lines of “Infinite confusion, and some great insight at some points”.

To elaborate on that,

  • lfs mostly served as a reassurance that indeed, what I did know about the system was mostly correct.
  • I did have the chance to see the distribution get built right before my eyes, which was something I longed for a great amount of time.
  • It did make me somewhat more familiar with the configure && make && make install cycle
  • It made me realise that the directories in the system are the simple result of a mkdir command, and that configuration files in the /etc/folder are handwritten plain files. (yeah, I feel stupid about that one – I don’t know what I was expecting. This was probably the result of the “magic involved” that the distro making process entailed for me)
  • I got to see the specific software that is needed to create a distribution, and demonstrate to me how I can build it, customize that build, or even change that software to my liking
  • And last but not least, something that nearly every lfs user says after a successful try: I knew that package managers did a great many things in order to maintain the system, and that much of the work I would normally have to do was done nearly automatically but boy, was I underestimating them. After lfs, I developed a new appreciation for a good package manager.

Epilogue

Lfs was, for the most part, a great experience. As a knowledge expander, it works great. As a system that you keep and continue to maintain? I don’t know. I know that people have done that in the past, but I decided against maintaining my build, as I figured it would be very time consuming, and that if I ever wanted to gain the experience of maintaining a distro, I would probably fork something like Crux.

In the end if you ask me if I can recommend that to you, I will say that I’m not so sure. It will provide you with some insight into the internals of a GNU/Linux distribution, but it won’t make you a better programmer as some people claim (most of the process revolves around the configure && make && make install cycle, and some conf files handwriting).

In the end, it is yourself who you should ask. Do you want that knowledge? Is it worth the hassle for you? Do you want the bragging rights? Are you crazy enough to want to maintain it? These are all questions that you get as many answers to them as the people you ask.


How the compiler, the Library and the kernel work - Part 1

Before we get any further, it might be good if we provided some context.

Hello world. Again.

helloworld.c
1
2
3
4
5
6
7
8
9
#include <stdio.h>

int
main (int argc, char **argv)
{
    printf ("Hello world!\n");

    return 0;
}

Every user space programmer, has written a hello world program. Only god knows how many times this program has been written. Yet, most programmers’ knowledge of the program is limited to something along the lines of:

  • It sends the string passed as a parameter to the system to print.
  • It takes the printf function from stdio.h and prints the string

and various other things, which are anywhere between plain wrong, or partially correct.

So why not demistify the process?

Enter the C preprocessor.

You may have heard of the C Preprocessor. It’s the first stage of a c or c++ file compilation, and it’s actually responsible for things such as inclusion of header files (it does so by replacing #include <header.h> with the content of this file, and the file it includes recursively), macro expansion (such as the famous comparison of two numbers (a greater than b) #define gt(a, b) ((a >= b) ? 1 : 0)), conditional compilation (things such as:

1
2
3
#ifdef WIN32 
    printf ("We are on windows\n");
#endif

amongst others. You can see it for yourself. Write the hello world program, and pass it to cpp: cpp hello_world.c

So now that we know what it does it’s time to demistify a common myth regarding it: Some people believe that the header files include the function to be called.. That’s wrong. What it does include is function prototypes (and some type definitions, etc) only. It doesn’t include the body of the function to be called.

Some people are quite surprised by that fact. In fact, it isn’t, if you get to understand what the compiler does with it.

Say hello to the compiler.

Here we are gonna unmask another pile of misconceptions. First of all, some people think that when they call gcc on the command line they are actually calling the compiler. They are not. In fact they are calling the software commonly called the compilation driver, whose job is to run all the software needed to fully turn source to binary, including preprocessors, the actual compiler, an assembler and finally the linker

Having said that, the actual compiler that’s getting called when you call gcc is called cc1. You may have seen it some times when the driver reports errors. Wanna take a look at it, to make sure I’m not lying to you? (Hint: I’m not!) Fair enough. Why don’t you type this in the command line: gcc -print-prog-name=cc1. It should tell you where the actual compiler is located in your system.

So now that we have this (misconception) out of our minds, we can continue with our analysis. Last time we talked about it, we said that the header files include prototypes and not the whole function.

You may know that in C, you usually declare a function, before you use it. The primary reason for doing this is to provide the compiler with the ability to perform type checking, that is to check that the arguments passed are correct, both in number, and in type, and to verify that the returned value (assuming there is one) is being used correctly. Below is a program that demonstrates the function prototype:

prototype.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <stdio.h>

int add_nums (int first, int second)

int
main (void)
{
    printf ("5 + 5 results in %d\n", add_nums (5, 5));

    return 0;
}

int
add_nums (int first, int second)
{
    return first + second;
}

In this particular example, the prototype gives the compiler a wide variety of information. It tells it that function add_nums takes two int arguments and returns an integer to the calling function. Now the compiler can verify that I am passing correct arguments to it when I call it inside printf. If I don’t include the function prototype, and do something slightly evil such as calling add_nums with float arguments then this might happen:

prototype.c
1
5 + 4 results in 2054324224

Now that you know that the compiler (the real one) only needs the prototype and not the actual function code, you may be wondering how the compiler actually compiles it if it doesn’t know it’s code.

Now is the time to bring down another missconception. The word compiler is just a fancy name for software otherwise known as translators. A translator’s job is to get input and turn it from one language (source language) to a second language (target language), whatever that may be. Most of the times, when you compile software, you compile it to run in your computer, which runs on a processor from the x86 architecture family of processors. A processor is typically associated with an assembly language for that architecture (which is just human friendly mnemonics for common processor tasks), so your x86 computer runs x86 assembly (ok that’s not 100% true, but for simplicity’s sake at the moment, it should serve. We will see why it’s not true later.) So the compiler (in a typical translation) translates (compiles) your C source code to x86 assembly. You can see this by compiling your hello world example and passing the compiler the -S (which asks it to stop, after x86 assembly is produced) parameter, likewise gcc -S hello.c.

Conclusion

At this part, we saw how the compiler and the preprocessor work with our code, in an attempt to demistify the so called library calls. In the next part, we are going to study the assembler and the linker, and for the final part the loader and the kernel.


GSOC Week 11 report

Introduction

This week was spent investigating the runtime and debugging executables with gdb. It was interesting in the sense that it provided me with some interesting pieces of information. Without any further ado, let’s present our findings:

My findings

Before starting out playing with libpthread, and glibc, I wanted to make sure that the goruntime behaved the way I believed it behaved, and make some further assurances about the goruntime. These assurances had to do with the total number of goroutines and the total number of machine threads at various checkpoints in the language runtime.

  • The first thread in the program is initialised during runtime_schedinit.
  • The number of m’s (kernel threads) is dependent on the number of goroutines. The runtime basically attempts to create an equal amount of m’s to run the goroutines. We can observe everytime a new goroutine is created, there is a number of calls to initiate a new kernel thread.
  • There are at least two kernel threads. One that supports the runtime (mainly the garbage collector) and one that executes the code of the go program.

There is only one small piece of code in the goruntime that creates some sort of confusion for me, and that is the code for a new m initialisation. Let me first present the code that confuses me:

gcc/libgo/runtime/proc.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
M*
runtime_newm(void)
{

    ...
  mp = runtime_mal(sizeof *mp);

    ...
  mcommoninit(mp);
  mp->g0 = runtime_malg(-1, nil, nil);

    ...
  if(pthread_attr_init(&attr) != 0)
      runtime_throw("pthread_attr_init");
  if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
      runtime_throw("pthread_attr_setdetachstate");

    ...
}

I purposely compacted the function for brevity, as it only serves as a demonstration for a point. Now, my confusion lies in the line mp->g0 = runtime_malg(-1, nil, nil). It is a piece of code that allocates memory for a new goroutine. Now I am ok with that, but what I do not understand is that new kernel threads (m’s) are supposed to be pick and run a goroutine from the global goroutine pool – that is run an existing one, and not create a new one. Now, the runtime_malg is given parameters that don’t initialise a new goroutine properly, but still, new memory is allocated for a new goroutine, and is returned to mp->g0 from runtime_malg.

Assuming I have not misunderstood something, and I am not mistaken (which is kind of likely), this is behavior that could lead to a number of questions and/or problems. For instance, what happens to the goroutine created by runtime_malg? Is it killed after the m is assigned a new goroutine to execute? Is it parked on the goroutine global list? Is it just ignored? Does it affect the runtime scheduler’s goroutine count? This is the last thing I feel I wanna clear out regarding gccgo’s runtime.

gdb

For this week, I also run the executables created by gccgo through gdb. It was a fertile attempt that, most of the time, confirmed my findings in the goruntime. It also provided us with some other nice pieces of information regarding the crashing of goroutines, but also left me with a question.

The code in question that I run through gdb is this:

goroutine.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    fmt.Println("[!!] right before a go statement")
    go say("world")
    say ("hello")
}

Your very typical hello world like goroutine program. Now, setting a break point in main (not the program’s main, that’s main.main. main as far as the runtime is concerned is the runtime entry point, in go-main.c) and running it through gdb yields the following results:

goroutine.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Breakpoint 1, main () at ../../../gcc_source/libgo/runtime/go-main.c:52
52 runtime_check ();
2:  __pthread_total = 1
1: runtime_sched.mcount = 0
(gdb) next
53 runtime_args (argc, (byte **) argv);
2: __pthread_total = 1
1: runtime_sched.mcount = 0
54 runtime_osinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 0
63: runtime_schedinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 1

Up until now, nothing unexpected. The kernel thread is registered with the runtime scheduler during its initialisation process in runtime_schedinit and that’ why the runtime_sched.mcount is reported to be zero many times before schedinit is run.

goroutine.go
1
2
3
4
5
68 __go_go (mainstart, NULL);
2: __pthread_total = 1
1: runtime_sched.mcount = 1
(gdb) display runtime_sched.gcount
3: runtime_sched.gcount = 0

That too is ok, because a new goroutine is registered with the scheduler during the call to __go_go. Now I am gonna fast forward a bit, to a more interesting point.

goroutine.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...
[DEBUG] (in runtime_gogo) new goroutine's status is 2
[DEBUG] (in runtime_gogo) number of goroutines now is 2
[New Thread 629.30]

Program received SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) info threads
 Id   Target  Id       Frame
 6    Thread  629.30   0x08048eb7 in main.main () at goroutine.go:12
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

This is getting weird. I mean, libpthread is reporting that 2 threads are active, but gdb reports that 3 are active. Anyway, let’s continue:

goroutine.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[DEBUG] (in runtime_stoptheworld) stopped the garbage collector
[DEBUG] (in runtime_starttheworld) starting the garbage collector
[DEBUG] (in runtime_starttheworld) number of m's now is: 2
[DEBUG] (in runtime_starttheworld) [note] there is already one gc thread
[!!] right before a go statement

Program received signal SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) continue
... (output omitted by me for brevity)

[DEBUG] (in runtime_newm) Right before the call to pthread_create.
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
[New Thread 629.31]

Program received signal SIGABRT, Aborted.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3

Oh my goodness. From a first glance, this seems to be a very serious inconsistency between libpthread and the goruntime. At this point, the go scheduler reports 3 threads (3 registered threads, that means that flow of execution has passed mcommoninit, the kernel thread initialisation function which also registers the kernel thread with the runtime_scheduler) whereas libpthread reports 2 threads.

But WAIT! Where are you going? Things are about to get even more interesting!

goroutine.go
1
2
3
4
5
6
(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

GDB reports 4 threads. Yes, 4 threads ladies and gentlemen. Now take a look closely. 3 threads are in the same frame, with the one with id 4 being the one currently executed. And there is also a pattern. 0x01da48ec is the value of the eip register for all 3 of them.

That’s one thing that is for certain. Now I already have an idea. Why not change the current thread to the one with id 7? I’m sold to the idea, let’s do this:

goroutine.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(gdb) thread 7
[Switching to thread 7 (Thread 629.31)]
#0  0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
(gdb) continue
Continuing.

Program received signal SIGABRT, Aborted.
[Switching to Thread 629.28]
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3
(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01dc08b0 in ?? () from /lib/i386-gnu/libc.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

Damn. But I am curious. What’s the next value to be executed?

goroutine.go
1
2
(gdb) x/i $eip
=> 0x1da48ec: ret

And what is the next value to be executed for the thread with id 7?

goroutine.go
1
2
(gdb) x/i $eip
=> 0x1dc08b0: call *%edx

Conclusion

Apparently, there is still much debugging left to checkout what is really happening. But we have got some leads in the right direction, that hopefully will lead us to finally finding out where the problem lies, and correct it.

Most importantly, in my immediate plans, before iI start playing around with libpthread is to attempt the same debugging run on the same code, under linux (x86). Seeing as go is clean on linux, it would provide some clues as to what the expected results should be, and where the execution differentiates substantially, a clue that might be vital to finding the problem.


GSOC week 10 report

Introduction

This week was spent attempting to debug the gccgo runtime via print statements. There were many things that I gained from this endeavour. The most significant of which, is the fact that I have got a great deal of information regarding the bootstrapping of a go process. Let’s proceed into presenting this week’s findings, shall we?

Findings

The process bootstrapping sequence

The code that begins a new go-process is conveniently located in a file called go-main.c, the most significant part of which is the following:

go-main.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int
main (int argc, char **argv)
{
  runtime_check ();
  runtime_args (argc, (byte **) argv);
  runtime_osinit ();
  runtime_schedinit ();
  __go_go (mainstart, NULL);
  runtime_mstart (runtime_m ());
  abort ();
}

static void
mainstart (void *arg __attribute__ ((unused)))
{
  runtime_main ();
}

The process is as follows:

  • First runtime_check runs and registers the os_Args and syscall_Envs as runtime_roots with the garbage collector. I am still investigating what this function exactly is doing, but it seems like some early initialisation of the garbage collector
  • Secondly, runtime_args is run. It’s job is to call a specific argument handler for the arguments passed to main.
  • Thirdly, runtime_osinit is run, whose job is to call the lowlevel _CPU_COUNT function, to get the number of CPUs (in a specific data structure that represents a set of CPUs)
  • After that, runtime_schedinit is run, whose job is to create the very first goroutine (g) and system thread (m), and continues with parsing the command line arguments, and the environment variables. After that it sets the maximum number of cpus that are to be used (via GOMAXPROCS), runs the first goroutine, and does some last pieces of the scheduler’s initialisation.
  • Following runtime_schedinit, __go_go is run, a function whose purpose is to create a new queue, tell it to execute the function that is passed to it as the first parameter, and then queue the goroutine in the global ready-to-run goroutine pool.
  • Last but not least, runtime_mstart runs, which seems to be starting te execution of the kernel thread created during runtime_schedinit.

The very last piece of code that is run (and most probably the most important) is runtime_main. Remember that this is passed as a parameter to a goroutine created during the __go_go call, and its job is to mark the goroutine that called it as the main os thread, to initialise the sceduler, and create a goroutine whose job is to release unused memory (from the heap) back to the OS. It then starts executing the process user defined instructions (the code the programmer run) via a call to a macro that directs it to __go_init_main in the assembly generated by the compiler.

Runtime_main is also the function that terminates the execution of a go process, with a call to runtime_exit which seems to be a macro to the exit function.

Other findings

During our debugging sessions we found out that the total count of kernel threads that are running in a simple program is at least two. The first one is the bootstrap M, (the one initialised during the program’s initialisation, inside runtime_schedinit) and at least another one, (I am still invistigating the validity of the following claim) created to be used by the garbage collector.

A simple go program such as one doing arithmetic or printing a helloworld like message evidently has no issue running. The issues arrise when we use a go statement. With all our debugging messages activated, this is how a simple go program flows:

go-main.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] (in main) before runtime_mcheck is run
[DEBUG] (in main) before runtime_args is run
[DEBUG] (in main) before runtime_osinit is run
[DEBUG] (in main) before runtime_schedinit is run
[DEBUG] (in main) before runtime_mstart is run
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (in mainstart) right before the call to runtime_main
[DEBUG] (in runtime_main) Beginning of runtime_main
[DEBUG] (start of runtime_newm) Total number of m's is 1
[DEBUG] (in runtime_newm) Preparing to create a new thread
[DEBUG] (in runtime_newm) Right before the call to pthread_create
[DEBUG] (in runtime_newm) pthread_create returned 0
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (end of runtime_newm) Total number of m's is 2
Hello, fotis
[DEBUG] (in runtime_main) Right before runtime_exit

And this is how a goroutine powered program fails:

go-main.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] (in main) before runtime_mcheck is run
[DEBUG] (in main) before runtime_args is run
[DEBUG] (in main) before runtime_osinit is run
[DEBUG] (in main) before runtime_schedinit is run
[DEBUG] (in main) before runtime_mstart is run
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (in mainstart) right before the call to runtime_main
[DEBUG] (in runtime_main) Beginning of runtime_main
[DEBUG] (start of runtime_newm) Total number of m's is 1
[DEBUG] (in runtime_newm) Preparing to create a new thread
[DEBUG] (in runtime_newm) Right before the call to pthread_create
[DEBUG] (in runtime_newm) pthread_create returned 0
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (end of runtime_newm) Total number of m's is 2
[DEBUG] (start of runtime_new) Total number of m's is 2
[DEBUG] (in runtime_newm) Preparing to create a new thread.
[DEBUG] (in runtime_newm) Right before the call to pthread_create
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

Work for the next week

I will of course continue to print debug until I have knowledge of the exact flow of execution in the go runtime. Right now I have very good knowledge of the flow, but there are some things that I need to sort out. For instance it is not exactly clear to me why we call certain functions, or what they are supposed to be doing at certain parts. After I sort this out, I also plan to start debugging the libpthread to see what’s libpthreads status during a hello world like program, and during a goroutine powered program, to get to see if we get to find something interesting in libpthread (like how many threads does libpthread report against how many the goruntime reports)


GSOC Week 9 (Partial) report

This week was revolving around the print debugging in the gccgo runtime in search for clues regarding the creation of new threads under the goruntime, so as to see if there is something wrong with the runtime itself, or the way the runtime interacts with the libpthread.

(partial presentation of) findings

During print debugging the gccgo runtime, I didn’t notice anything abnormal or unusual so far. For example, the code that does trigger the assertion failure seems to work at least once, since pthread_create() returns 0 at least once.

This is expected behavior, since we already have stated that there is at least one M (kernel thread) created at the initialisation of the program’s runtime.

If however, we try to use a go statement in our program, to make usage of a goroutine, the runtime still fails at the usual assertion fail, however the output of the program is this:

1
2
3
4
5
root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] pthread_create returned 0
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

The above output can give us some pieces of information:

  • pthread_create() is called at least once.
  • it executes successfuly and without errors – libpthread code suggests that 0 is returned upon successful execution and creation of a thread
  • However the assertion is still triggered, which we know it’s getting triggered during thread creation.

The second bullet point is also being supported by the fact that even if you exe cute something as simple as hello world in go, a new M is created, so you get something along the lines of this as an output:

1
2
3
4
root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] pthread_create returned 0
Hello World!
root@debian:~/Software/Experiments/go#

There is however something that the above piece of code doesn’t tell us, but it would be useful to know: How many times did we create a new thread? So we modify our gcc’s source code to see how many times the runtimes attempts to create a new kernel thread (M). This is what we get out of it:

1
2
3
4
5
6
7
root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] Preparing to create a new thread.
[DEBUG] pthread_create returned 0
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
[DEBUG] Preparing to create a new thread.
aborted.

The code at this point in the runtime is this:

proc.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Create a new m.  It will start off with a call to runtime_mstart.
M*
runtime_newm(void)
{
  M *mp;
  pthread_attr_t attr;
  pthread_t tid;
  size_t stacksize;
  sigset_t clear;
  sigset_t old;
  int ret;

#if 0
  static const Type *mtype;  // The Go type M
  if(mtype == nil) {
      Eface e;
      runtime_gc_m_ptr(&e);
      mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
  }
#endif

  // XXX: Added by fotis for print debugging.
  printf("[DEBUG] Preparing to create a new thread.\n")

  mp = runtime_mal(sizeof *mp);
  mcommoninit(mp);
  mp->g0 = runtime_malg(-1, nil, nil);

  if(pthread_attr_init(&attr) != 0)
      runtime_throw("pthread_attr_init");
  if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
      runtime_throw("pthread_attr_setdetachstate");

  // <http://www.gnu.org/software/hurd/open_issues/libpthread_set_stack_size.html>
#ifdef __GNU__
  stacksize = StackMin;
#else
  stacksize = PTHREAD_STACK_MIN;

  // With glibc before version 2.16 the static TLS size is taken
  // out of the stack size, and we get an error or a crash if
  // there is not enough stack space left.  Add it back in if we
  // can, in case the program uses a lot of TLS space.  FIXME:
  // This can be disabled in glibc 2.16 and later, if the bug is
  // indeed fixed then.
  stacksize += tlssize;
#endif

  if(pthread_attr_setstacksize(&attr, stacksize) != 0)
      runtime_throw("pthread_attr_setstacksize");

  // Block signals during pthread_create so that the new thread
  // starts with signals disabled.  It will enable them in minit.
  sigfillset(&clear);

#ifdef SIGTRAP
  // Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
  sigdelset(&clear, SIGTRAP);
#endif

  sigemptyset(&old);
  sigprocmask(SIG_BLOCK, &clear, &old);
  ret = pthread_create(&tid, &attr, runtime_mstart, mp);

  /* XXX: added for debug printing */
  printf("[DEBUG] pthread_create() returned %d\n", ret);

  sigprocmask(SIG_SETMASK, &old, nil);

  if (ret != 0)
      runtime_throw("pthread_create");

  return mp;
}

We can deduce two things about our situation right now:

  • There is at least one thread successfully created, and there is an attempt to create another one.
  • The second time, there is a failure before pthread_create is called.

Continuation of work.

I have been following this course of path the last week. I presented some of my findings, and hope to soon be able to write an exhaustive report on what exactly it is that causes the bug.


GSOC Week 8 (Partial) report

This week was spent studying the go language’s runtime and studying the behaviour of various go programs when executed under the Hurd. I learnt a variety of new things, and got some new clues about the problem.

The new libgo clues

I already know that M’s are the “real” kernel schedulable threads and G’s are the go runtime managed ones (goroutines). Last time I had gone through the go runtime’s code I had noticed that neither of them get created, so there must be an issue with thread creation. But since there is at least one of each created during the program’s initialization, how come most programs are able to run, and issues present themselves when we manually attempt to run a goroutine?

I will admit that the situation looks strange. So I decided to look more into it. Before we go any further, I have to embed the issues I had when I run goroutine powered programs under the Hurd.

1
2
3
4
root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

__pthread_create_internal is a libpthread function that gets called when a new posix thread is instanciated. So we know that when we call a goroutine, apart from the goroutine, there is at least one kernel thread created, otherwise, if a new goroutine was created, and not a new kernel thread (M) why wasn’t it matched with an existing kernel thread (remember there is at least one).

That made me look into the go runtime some more. I found a lot of things, that I can not enumerate here, but amongst the most interesting ones, was the following piece of code:

proc.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// Create a new m.  It will start off with a call to runtime_mstart.
M*
runtime_newm(void)
{
  M *mp;
  pthread_attr_t attr;
  pthread_t tid;
  size_t stacksize;
  sigset_t clear;
  sigset_t old;
  int ret;

#if 0
  static const Type *mtype;  // The Go type M
  if(mtype == nil) {
      Eface e;
      runtime_gc_m_ptr(&e);
      mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
  }
#endif

  mp = runtime_mal(sizeof *mp);
  mcommoninit(mp);
  mp->g0 = runtime_malg(-1, nil, nil);

  if(pthread_attr_init(&attr) != 0)
      runtime_throw("pthread_attr_init");
  if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
      runtime_throw("pthread_attr_setdetachstate");

  stacksize = PTHREAD_STACK_MIN;

  // With glibc before version 2.16 the static TLS size is taken
  // out of the stack size, and we get an error or a crash if
  // there is not enough stack space left.  Add it back in if we
  // can, in case the program uses a lot of TLS space.  FIXME:
  // This can be disabled in glibc 2.16 and later, if the bug is
  // indeed fixed then.
  stacksize += tlssize;

  if(pthread_attr_setstacksize(&attr, stacksize) != 0)
      runtime_throw("pthread_attr_setstacksize");

  // Block signals during pthread_create so that the new thread
  // starts with signals disabled.  It will enable them in minit.
  sigfillset(&clear);

#ifdef SIGTRAP
  // Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
  sigdelset(&clear, SIGTRAP);
#endif

  sigemptyset(&old);
  sigprocmask(SIG_BLOCK, &clear, &old);
  ret = pthread_create(&tid, &attr, runtime_mstart, mp);
  sigprocmask(SIG_SETMASK, &old, nil);

  if (ret != 0)
      runtime_throw("pthread_create");

  return mp;
}

This is the code that creates a new kernel thread. Notice the line ret = pthread_create(&tid, &attr, runtime_mstart, mp);. It’s obvious that it creates a new kernel thread, so that explains why we get the specific error. But what is not explained is that since we do have at least one in program startup, why is this specific error only triggered when we manually create a go routine?

Go programs under the Hurd

Apart from studying Go’s runtime source code, I also run some experiments under the Hurd. I got some very weird results that I am investigating, but I would like to share nonetheless. Consider the following piece of code:

hellogoroutines.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    say("world")
    say("hello")
}

A very basic example that can demonstrate goroutines. Now, if we change one of the say functions inside main to a goroutine, this happens:

hellogoroutines.go
1
2
3
4
root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

BUT if we change BOTH of these functions to goroutines (go say("world"), go say("hello")), this happens:

hellogoroutines.go
1
2
root@debian:~/Software/Experiments/go# ./a.out
root@debian:~/Software/Experiments/go#

Wait a minute. It can’t be! Did it execute correctly? Where is the output?

hellogoroutines.go
1
2
3
root@debian:~/Software/Experiments/go# echo $?
0
root@debian:~/Software/Experiments/go#

It reports that it has executed correctly. But there is no output.

What I am doing next

I will continue reading through the go runtime for some clues. On the more active size, I am writing a custom test case for goroutine testing under the Hurd, while also doing some analysis on the programs that run there (currently studying the assembly generated for these programs) to see how they differ and why we get this particular behavior.


GSOC (Partial) Week 7 report

An exciting week.

This week was exciting. Spending it on learning about the go runtime was the reason for this. As insightfull as it was however, it also confused me a little bit. Before this goes any further, I should state that this is a partial report on my research and my findings. My aims for this week were the following: To investigate the behavior of go programs under the Hurd, to study the go runtime, and possibly modify it to see if the goroutine issues are libpthread’s issue or the go’s runtime issue.

Presenting my findings.

Most of my time was spent studying the gcc go frontend, libgo and the go runtime. Fortunatelly, I can say (gladly) that it was time well spent. What I got from it were some nice pieces of insight, but also some slight confusion and doubts.

The first interesting thing in my findings was this:

runtime.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
struct G
{
  Defer*   defer;
  Panic*   panic;
  void*    exception;   // current exception being thrown
  bool    is_foreign;  // whether current exception from other language
  void    *gcstack; // if status==Gsyscall, gcstack = stackbase to use during gc
  uintptr gcstack_size;
  void*    gcnext_segment;
  void*    gcnext_sp;
  void*    gcinitial_sp;
  ucontext_t gcregs;
  byte*    entry;       // initial function
  G*   alllink; // on allg
  void*    param;       // passed parameter on wakeup
  bool    fromgogo;    // reached from gogo
  int16   status;
  int64   goid;
  uint32  selgen;      // valid sudog pointer
  const char*  waitreason;  // if status==Gwaiting
  G*   schedlink;
  bool    readyonstop;
  bool    ispanic;
  bool    issystem;
  int8    raceignore; // ignore race detection events
  M*   m;       // for debuggers, but offset not hard-coded
  M*   lockedm;
  M*   idlem;
  int32   sig;
  int32   writenbuf;
  byte*    writebuf;
  // DeferChunk  *dchunk;
  // DeferChunk  *dchunknext;
  uintptr sigcode0;
  uintptr sigcode1;
  // uintptr sigpc;
  uintptr gopc;    // pc of go statement that created this goroutine

  int32   ncgo;
  CgoMal*  cgomal;

  Traceback* traceback;

  ucontext_t  context;
  void*        stack_context[10];
};

Yep. This is the code that resembles a (yeah, you guessed it, a goroutine). I was pretty surprised at first to see that a thread is resembled as a struct. But then again, taking a closer look at it, it makes perfect sense. The next one though was a lot trickier:

runtime.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
struct M
{
  G*   g0;      // goroutine with scheduling stack
  G*   gsignal; // signal-handling G
  G*   curg;        // current running goroutine
  int32   id;
  int32   mallocing;
  int32   throwing;
  int32   gcing;
  int32   locks;
  int32   nomemprof;
  int32   waitnextg;
  int32   dying;
  int32   profilehz;
  int32   helpgc;
  uint32  fastrand;
  uint64  ncgocall;    // number of cgo calls in total
  Note    havenextg;
  G*   nextg;
  M*   alllink; // on allm
  M*   schedlink;
  MCache  *mcache;
  G*   lockedg;
  G*   idleg;
  Location createstack[32]; // Stack that created this thread.
  M*   nextwaitm;   // next M waiting for lock
  uintptr waitsema;    // semaphore for parking on locks
  uint32  waitsemacount;
  uint32  waitsemalock;
  GCStats gcstats;
  bool    racecall;
  void*    racepc;

  uintptr settype_buf[1024];
  uintptr settype_bufsize;

  uintptr end[];
};

This was a source of endless confusion at the beginning. It does have some hints reassuring the fact that G’s are indeed goroutines, but nothing that really helps to describe what an M is. It’s structure is identical to that of the G however, which means that it might have something to do with a thread. And indeed it is. Further study of the source code made me speculate that M’s must be the real operating system scheduled (kernel) threads, while G’s (goroutines) must be the lightweight threads managed by the go runtime.

I was more than happy to find comments that reassured that position of mine.

runtime.h
1
2
// The go scheduler's job is to match ready-to-run goroutines (`g's)
// with waiting-for-work schedulers (`m's)

Another cool finding was the go (runtime) scheduler – from which the above comment originates:

proc.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct Sched {
  Lock;

  G *gfree; // available g's (status == Gdead)
  int64 goidgen;

  G *ghead; // g's waiting to run
  G *gtail;
  int32 gwait; // number of g's waiting to run
  int32 gcount;    // number of g's that are alive
  int32 grunning;  // number of g's running on cpu or in syscall

  M *mhead; // m's waiting for work
  int32 mwait; // number of m's waiting for work
  int32 mcount;    // number of m's that have been created

  volatile uint32 atomic;  // atomic scheduling word (see below)

  int32 profilehz; // cpu profiling rate

  bool init;  // running initialization
  bool lockmain;  // init called runtime.LockOSThread

  Note    stopped; // one g can set waitstop and wait here for m's to stop
};

From that particular piece of code, without a doubt the most interesting line is: G *gfree. That is a pool of the go routines that are available to be used. There are also helper schedulling functions, from which, the most interesting (for my purposes), was the static void gfput(G*); which realeases a go routine (puts it to the gfree list)

proc.c
1
2
3
4
5
6
7
// Put on gfree list.  Sched must be locked.
static void
gfput(G *gp)
{
  gp->schedlink = runtime_sched.gfree;
  runtime_sched.gfree = gp;
}

There are loads of other extremely interesting functions there, but for the sake of space I will not expand here more. However I will expand on what it is that is confusing me:

The source of confusion

My tests in this point are to include testing if removing thread destruction from the go runtime would result in difference in behavior. There are however (as far as go is concerned), two kinds of threads in the go runtime. Goroutines (G’s) and the kernel schedulable threads (M’s).

Neither of which, seem to really be destroyed. From my understanding so far, G’s are never totally destroyed (I may be wrong here, I am still researching this bit). Whenever they are about to “destroyed”, they are added to the scheduler’s list of freeG’s to allow for reuse, as evidenced by the gfput and gfget functions. M’s on the other hand (the kernel threads), also seem to not be destroyed. A comment in go’s scheduler seems to support this (// For now, m's never go away.) and as a matter of fact I could not find any code that destroyed M’s (I am still researching this bit).

Since none of the two actually get destroyed, and seeing as thread creation alone should not be buggy, how come we are facing the specific bugs we are facing? I will try to provide with an interpretation: Either I am fairly wrong and M’s (or G’s or both) actually do get destroyed somewhere (possible and very much probable) or I looking for clues regarding the issue in the wrong place (might be possible but I don’t see it being very probable).


GSOC: Week 6 report

First of all, I would like to apologize for this report being late. But unfortunately this happened: I Accidentally 93 MB

Only that, in my case, it was not exactly 93 MB, rather it was about 1.5GB. Yeah, I accidentally obliterated my GCC repository on the Hurd, so I had to reclone and rebuild everything, something that took considerable amounts of time. How this happened is a long story that involved me wanting to rebuild my gcc, and cd-ing 2 directories above the build folder, and ending up rm -rf * from my gcc folder (that included the source, and the build folder) rather than my gcc_build folder. Thank god, that was only a minor setback, and the (small scale) crisis was soon averted.

Further research

This week was mostly spent reading source code, primarily looking for clues for the previous situation, and secondarily to get a better undestanding of the systems I am working on. This proved to be fertile, as I got a firmer grip of libpthread, and the GNU Mach system. However, while this week was mostly spent reading documentation, that doesn’t mean that I didn’t do anything practical. I also used my time to do some further research into what was it specifically that triggered the assertion failure. That required us to play a little bit with our newly built compiler on the Hurd and see what we can do with go on the Hurd.

Testing gccgo under the Hurd

If you recall correctly, the last time I reported I had found out that an assertion on libpthread`s code was failing, and that was the root cause that failed both the gccgo tests and the libgo tests. That assertion was failing at two different places in the code, the first being __pthread_create_internal which is a libpthread function located in libpthread/pthread/pt-create.c and is invoked when an application wants to create a new POSIX thread. That function of course is not getting called directly, rather it is invoked by pthread_create which is the function that user space application use to create the new thread. (For reference reasons you can find the code here)

The second place where that assertion was failing was at __sem_timedwait_internal at the file libpthread/sysdeps/generic/sem-timedwait.c, where it gets inlined in the place of self = _pthread_self ();. (For more information, checkout last week’s report).

So I was curious to test out the execution of some sample programs under the compiler we built on the Hurd. Beginning with some very simple hello world like programs, we could see that they were compiling successfully, and also ran successfully without any issues at all. Seeing as the assertion failure is generated when we attempt to create a new thread, I figured I might want to start playing with go routines under the Hurd.

So we started playing with a simple hello world like goroutine example (the one available under the tour of go on the golang.org website.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package main

import (
    "fmt"
    "time"
)

func say(s string) {
    for i := 0; i < 5; i++ {
        time.Sleep(100 * time.Millisecond)
        fmt.Println(s)
    }
}

func main() {
    go say("world")
    say("hello")
}

This gets compiled without any issues at all, but when we try to run it…

1
2
3
4
5
6
7
8
9
10
11
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted


goroutine 1 [sleep]:
time.Sleep
  ../../../gcc_source/libgo/runtime/time.goc:26

goroutine 3 [sleep]:
time.Sleep
  ../../../gcc_source/libgo/runtime/time.goc:26

Bam! It exploded right infront of our face. Let’s see if this might become friendlier if we alter it a little bit. To do this we removed the go from say to avoid running it as a goroutine, and we also removed time.Sleep (along with the time import), whose job is to pause a go routine.

When you do this, the code seems to be a hello world like for loop sample, that prints:

1
2
3
4
5
6
7
8
9
10
11
root@debian:~/Software/Experiments/go# ./a.out
world
world
world
world
world
hello
hello
hello
hello
hello

Hmm. Let’s play with it some more. Changing our code a little bit to make say("world") run as a goroutine gives us the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    go say("world")
    say("hello")
}

Which, when executed results in this:

1
2
3
4
root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

So we can see that the simplest go programs that run with goroutines do not run. Let’s still try some programs that invoke goroutines to see if our assumptions are correct. Below is the code of a very simple web server in go (found in the golang website).

webserver.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
package main

import (
    "fmt"
    "net/http"
)

type Hello struct{}

func (h Hello) ServeHTTP(
    w http.ResponseWriter,
    r *http.Request) {
    fmt.Fprint(w, "Hello!")
}

func main() {
    var h Hello
    http.ListenAndServe("localhost:4000", h)
}

The (non surprising) result is the following:

webserver.go
1
2
3
4
5
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted

goroutine 1 [syscall]:
no stack trace available

Hmm. This failure was last caused by time.Sleep. So let’s take a closer look into the code of the ListenAndServe function. The code for this function in the go runtime is this:

gcc/libgo/go/net/http/server.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// ListenAndServe listens on the TCP network address srv.Addr and then
// calls Serve to handle requests on incoming connections.  If
// srv.Addr is blank, ":http" is used.
func (srv *Server) ListenAndServe() error {
  addr := srv.Addr
  if addr == "" {
      addr = ":http"
  }
  l, e := net.Listen("tcp", addr)
  if e != nil {
      return e
  }
  return srv.Serve(l)
}

This calls the function Serve. The interesting part in this one is line 1271:

1
 time.Sleep(tempDelay)

It calls time.Sleep on accept failure. Which is known to pause go routines, and as a result be the ultimate cause for the result we are seeing.

Final thoughts – Work for next week

So pretty much everything that has anything to do with a goroutine is failing. Richard Braun on the #hurd suggested that since creation and destruction of threads is buggy in libpthread, maybe we should try a work around until a proper fix is in place. Apart from that my mentor Thomas Schwinge suggested to make thread destruction in go’s runtime a no-op to see if that makes any difference. If it does that should mean that there is nothing wrong in the go runtime itself, rather, the offending code is in libpthread. This is also my very next course of action, which I shall report on very soon.


GSOC: Week 5 report

A clue!

So last week we were left with the compiler test logs and the build results logs that we had to go through to checkout what was the root cause of all these failures in the gccgo test results, and more importantly in the libgo tests. So I went through the gccgo logs in search for a clue about why this may have happened. Here is the list of all the failures I compiled from the logs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

spawn [open ...]^M
doubleselect.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_s      elf_ + 0), ktid); ok; })' failed.
FAIL: go.test/test/chan/doubleselect.go execution,  -O2 -g

==========================================================

spawn [open ...]^M
nonblock.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_       + 0), ktid); ok; })' failed.
FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g

==========================================================

Executing on host: /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../  -fno-diagnostics-show-caret -fdiagnostics-color=never  -I/root/gcc_new/gccbuild/i68      6-unknown-gnu0.3/./libgo  -fsplit-stack -c  -o split_stack376.o split_stack376.c    (timeout = 300)
spawn /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ -fno-diagnostics-show-caret -fdiagnostics-color=never -I/root/gcc_new/gccbuild/i686-unknown-gnu0.      3/./libgo -fsplit-stack -c -o split_stack376.o split_stack376.c^M
cc1: error: '-fsplit-stack' currently only supported on GNU/Linux^M
cc1: error: '-fsplit-stack' is not supported by this compiler configuration^M
compiler exited with status 1
output is:
 cc1: error: '-fsplit-stack' currently only supported on GNU/Linux^M
 cc1: error: '-fsplit-stack' is not supported by this compiler configuration^M 

UNTESTED: go.test/test/chan/select2.go

==========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
select3.x: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted
 
FAIL: go.test/test/chan/select3.go execution,  -O2 -g

==========================================================

Executing on host: /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ /root/gcc_new/gcc/gcc/testsuite/go.test/test/chan/select5.go  -fno-diagnostics-show-      caret -fdiagnostics-color=never  -I/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo  -O  -w  -pedantic-errors  -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -L/root/gcc_new/gccbuild/i686-unknown-      gnu0.3/./libgo/.libs  -lm   -o select5.exe    (timeout = 300)
spawn /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ /root/gcc_new/gcc/gcc/testsuite/go.test/test/chan/select5.go -fno-diagnostics-show-caret -fdiagno      stics-color=never -I/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -O -w -pedantic-errors -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.lib      s -lm -o select5.exe^M
PASS: go.test/test/chan/select5.go -O (test for excess errors)
FAIL: go.test/test/chan/select5.go execution

==========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
bug147.x: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted
 
FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g

=========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
BUG: bug347: cannot find caller
Aborted
 
 
FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g

========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
BUG: bug348: cannot find caller
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x2 addr=0x0]
 
goroutine 1 [running]:
FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g

========================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
mallocfin.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self      _ + 0), ktid); ok; })' failed.
FAIL: go.test/test/mallocfin.go execution,  -O2 -g

=======================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
Aborted
 
 
FAIL: go.test/test/nil.go execution,  -O2 -g

======================================================

Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
spawn [open ...]^M
Aborted
 
 
FAIL: go.test/test/recover3.go execution,  -O2 -g

See a pattern there? Well certainly I do. In several occasions, the root cause for the fail is this:

Assertion fail
1
Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.

Hmm… That’s interesting. Let us go through the libgo results too.

Assertion fail
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
Test Run By root on Fri Jul 12 17:56:44 UTC 2013
Native configuration is i686-unknown-gnu0.3

      === libgo tests ===

a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 10005 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: bufio
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10005) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 10637 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: bytes
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10637) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 10757 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: errors
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10757) - No such process
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted


goroutine 1 [syscall]:
no stack trace available
FAIL: expvar
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (10886) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11058 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: flag
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11058) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11475 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: fmt
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11475) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11584 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: html
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11584) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11747 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: image
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11747) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 11999 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: io
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (11999) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 12116 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: log
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (12116) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 13107 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: math
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (13107) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 13271 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: mime
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (13271) - No such process
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
Aborted


goroutine 1 [chan receive]:
a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
panic during panic
testing.RunTestsFAIL: net
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (14234) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 14699 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: os
timed out in gotest
../../../gcc/libgo/testsuite/gotest: line 484: kill: (14699) - No such process
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
../../../gcc/libgo/testsuite/gotest: line 486: 14860 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
FAIL: path
timed out in gotest

...


runtest completed at Fri Jul 12 18:09:07 UTC 2013

That’s certainly even more interesting. In case you haven’t noticed, it’s the same assertion that caused the failures in gccgo test suite. Let us find the offending code, shall we?

libpthread/pthread/pt-create.c
1
2
3
4
5
6
7
8
9
10
/* Set the new thread's signal mask and set the pending signals to
     empty.  POSIX says: "The signal mask shall be inherited from the
     creating thread.  The set of signals pending for the new thread
     shall be empty."  If the currnet thread is not a pthread then we
     just inherit the process' sigmask.  */
  if (__pthread_num_threads == 1)
    err = sigprocmask (0, 0, &sigset);
  else
    err = __pthread_sigstate (_pthread_self (), 0, 0, &sigset, 0);
  assert_perror (err);

This seems to be the code that the logs point to. But no sign of the assertion. After discussing this issue with my peers in #hurd, I was told that the code I was looking for (the failing assertion), is getting inlined via _pthread_self () and is actually located in libpthread/sysdeps/mach/hurd/pt-sysdep.h.

libpthread/sysdeps/mach/hurd/pt-sysdep.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
extern __thread struct __pthread *___pthread_self;
#define _pthread_self()                                            \
 ({                                                         \
   struct __pthread *thread;                                \
                                                            \
   assert (__pthread_threads);                              \
   thread = ___pthread_self;                                \
                                                            \
   assert (thread);                                         \
   assert (({ mach_port_t ktid = __mach_thread_self ();     \
                     int ok = thread->kernel_thread == ktid;       \
                     __mach_port_deallocate (__mach_task_self (), ktid);\
                     ok; }));                                      \
          thread;                                                  \
         })

So this is what I was looking for. Further discussing it in the weekly IRC meeting, braunr provided me with some more clues:

08:38:15 braunr> nlightnfotis: did i answer that ?
08:38:24 nlightnfotis> braunr: which one?
08:38:30 nlightnfotis> hello btw :)
08:38:33 braunr> the problems you’re seeing are the pthread resources leaks i’ve been trying to fix lately
08:38:58 braunr> they’re not only leaks
08:39:08 braunr> creation and destruction are buggy
08:39:37 nlightnfotis> I have read so in http://www.gnu.org/software/hurd/libpthread.html. I believe it’s under Thread’s Death right?
08:40:15 braunr> nlightnfotis: yes but it’s buggy
08:40:22 braunr> and the description doesn’t describe the bugs
08:41:02 nlightnfotis> so we will either have to find a temporary workaround, or better yet work on a fix, right?
08:41:12 braunr> nlightnfotis: i also told you the work around
08:41:16 braunr> nlightnfotis: create a thread pool

Work for next week

This leaves us with next week’s work, which is to hack in libpthread’s code to attempt to create a thread pool, so that we avoid some of the issues that are present now with the current implementation of the Hurd libpthread code.

It was also suggested by Samuel Thibault (youpi) that I should run the libgo tests by hand and see if I get some more clues, like stack traces. It sounds like a good idea to me, so that’s something that I will look into too.


GSOC: Week 4 report

Yeah baby! It builds!

The highlight of this week’s progress was managing to successfully build gccgo under the Hurd. Not only did it compile successfully, it also run its tests, with the results matching the ones provided by my mentor Thomas Schwinge. This was a checkpoint in my summer of code project. Successful building of the compiler meant that I am (happily) in the position to carry on with the next part (and the main one) of my project, that is, to make sure that the go library (libgo) also passes all its tests and works without any major issues.

So where are we now?

gccgo

Compiling gccgo on the Hurd was big. But we also had to see how it compared to the build that was successful on Linux. The most effective way to compare the two builds, is to check the test results of the two.

Taking a look at the gccgo results on the Hurd, I was delighted to find that it passed most of its tests. There were few that were failing, but for the most part, it did well. Below are the test results of gccgo on the Hurd:

1
2
3
4
5
6
7
 === go Summary ===

# of expected passes        5069
# of unexpected failures    11
# of expected failures      1
# of untested testcases     6
/root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo  version 4.9.0 20130606 (experimental) (GCC)

So it’s passing 99% of its tests. That’s cool. But it could help to take a look at the tests that are failing, to get an idea of what the fails are, how critical they are, etc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
nlightnfotis@earth:~/HurdVM/HurdFiles$ grep -v ^PASS: < go.sum
Test Run By root on Thu Jul 11 10:33:34 2013
Native configuration is i686-unknown-gnu0.3

        === go tests ===

        Schedule of variations:
            unix

            Running target unix
            Running /root/gcc_new/gcc/gcc/testsuite/go.dg/dg.exp ...
            Running /root/gcc_new/gcc/gcc/testsuite/go.go-torture/execute/execute.exp ...
            Running /root/gcc_new/gcc/gcc/testsuite/go.test/go-test.exp ...
            FAIL: go.test/test/chan/doubleselect.go execution,  -O2 -g 
            FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g 
            UNTESTED: go.test/test/chan/select2.go
            FAIL: go.test/test/chan/select3.go execution,  -O2 -g 
            FAIL: go.test/test/chan/select5.go execution
            UNTESTED: go.test/test/closure.go
            FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g 
            FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g 
            FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g 
            XFAIL: bug429.go  -O2 -g  execution test
            FAIL: go.test/test/goprint.go execution
            UNTESTED: go.test/test/goprint.go compare
            UNTESTED: go.test/test/init1.go
            FAIL: go.test/test/mallocfin.go execution,  -O2 -g 
            FAIL: go.test/test/nil.go execution,  -O2 -g 
            FAIL: go.test/test/recover3.go execution,  -O2 -g 
            UNTESTED: go.test/test/rotate.go
            UNTESTED: go.test/test/stack.go

                    === go Summary ===

# of expected passes        5069
# of unexpected failures    11
# of expected failures      1
# of untested testcases     6
/root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo  version 4.9.0 20130606 (experimental) (GCC) 

Hmm. So these are the failing tests. Before we go through them, it might be a good idea to check the status of the gccgo tests on the Linux build too. Let’s see.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
nlightnfotis@earth:~$ grep -v ^PASS: < linux_go.sum 
Test Run By fotis on Mon Jul 15 10:28:38 2013
Native configuration is i686-pc-linux-gnu

        === go tests ===

        Schedule of variations:
            unix

            Running target unix
            Running /home/fotis/Software/gcc/gcc/testsuite/go.dg/dg.exp ...
            Running /home/fotis/Software/gcc/gcc/testsuite/go.go-torture/execute/execute.exp ...
            Running /home/fotis/Software/gcc/gcc/testsuite/go.test/go-test.exp ...
            UNTESTED: go.test/test/closure.go
            XFAIL: bug429.go  -O2 -g  execution test
            UNTESTED: go.test/test/init1.go
            UNTESTED: go.test/test/rotate.go

                    === go Summary ===

# of expected passes        5183
# of expected failures      1
# of untested testcases     3
/home/fotis/Software/gcc_build/gcc/testsuite/go/../../gccgo  version 4.9.0 20130702 (experimental) (GCC) 

So, it seems like there are less tests failing here. But wait a minute. Those tests that are failing. They are the same as with the Hurd build. So I can assume that we are left with 4 less tests to check regarding their failures (Go on Linux works without any issues,so I guess it would be safe to skip those tests at the moment). That leaves us with these tests to check:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
FAIL: go.test/test/chan/doubleselect.go execution,  -O2 -g
FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g
UNTESTED: go.test/test/chan/select2.go
FAIL: go.test/test/chan/select3.go execution,  -O2 -g
FAIL: go.test/test/chan/select5.go execution
FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g
FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g
FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g
FAIL: go.test/test/goprint.go execution
UNTESTED: go.test/test/goprint.go compare
FAIL: go.test/test/mallocfin.go execution,  -O2 -g
FAIL: go.test/test/nil.go execution,  -O2 -g
FAIL: go.test/test/recover3.go execution,  -O2 -g
UNTESTED: go.test/test/stack.go

Discussing this with my mentor Thomas Schwinge in IRC (#hurd)

1
2
<tschwinge> For now, please ignore any failing tests that have »select« in their name -- that is, do file them, but do not spend a lot of time figuring out what might be wrong there.
<tschwinge> The Hurd's select implementation is a bit of a beast, and I don't want you -- at this time -- spend a lot of time on that.  We already know there are some deficiencies, so we should postpone that to later.

So that leaves us with even less tests to check:

1
2
3
4
5
6
7
8
9
10
FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g
FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g
FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g
FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g
FAIL: go.test/test/goprint.go execution
UNTESTED: go.test/test/goprint.go compare
FAIL: go.test/test/mallocfin.go execution,  -O2 -g
FAIL: go.test/test/nil.go execution,  -O2 -g
FAIL: go.test/test/recover3.go execution,  -O2 -g
UNTESTED: go.test/test/stack.go

Nice. This narrowed down the list of errors that I have to go through to make sure that gccgo works as well on the Hurd as it does on Linux.

libgo

So, we talked about gccgo, but what about the runtime libraries (libgo)? They are also getting tested when we run make check-goand seeing as they are a vital part of enabling programs written on go to run on the Hurd, we ought to take a look. (This was also the original goal of my project proposal).

So let us see what we have at the libgo.sum:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
Test Run By root on Fri Jul 12 17:56:44 UTC 2013
Native configuration is i686-unknown-gnu0.3

        === libgo tests ===

        Schedule of variations:
            unix

            Running target unix
            Running ../../../gcc/libgo/libgo.exp ...
            FAIL: bufio
            FAIL: bytes
            FAIL: errors
            FAIL: expvar
            FAIL: flag
            FAIL: fmt
            FAIL: html
            FAIL: image
            FAIL: io
            FAIL: log
            FAIL: math
            FAIL: mime
            FAIL: net
            FAIL: os
            FAIL: path
            FAIL: reflect
            FAIL: regexp
            FAIL: runtime
            FAIL: sort
            FAIL: strconv
            FAIL: strings
            FAIL: sync
            FAIL: syscall
            FAIL: time
            FAIL: unicode
            FAIL: archive/tar
            FAIL: archive/zip
            FAIL: compress/bzip2
            FAIL: compress/flate
            FAIL: compress/gzip
            FAIL: compress/lzw
            FAIL: compress/zlib
            FAIL: container/heap
            FAIL: container/list
            FAIL: container/ring
            FAIL: crypto/aes
            FAIL: crypto/cipher
            FAIL: crypto/des
            FAIL: crypto/dsa
            FAIL: crypto/ecdsa
            FAIL: crypto/elliptic
            FAIL: crypto/hmac
            FAIL: crypto/md5
            FAIL: crypto/rand
            FAIL: crypto/rc4
            FAIL: crypto/rsa
            FAIL: crypto/sha1
            FAIL: crypto/sha256
            FAIL: crypto/sha512
            FAIL: crypto/subtle
            FAIL: crypto/tls
            FAIL: crypto/x509
            FAIL: database/sql
            FAIL: database/sql/driver
            FAIL: debug/dwarf
            FAIL: debug/elf
            FAIL: debug/macho
            FAIL: debug/pe
            FAIL: encoding/ascii85
            FAIL: encoding/asn1
            FAIL: encoding/base32
            FAIL: encoding/base64
            FAIL: encoding/binary
            FAIL: encoding/csv
            FAIL: encoding/gob
            FAIL: encoding/hex
            FAIL: encoding/json
            FAIL: encoding/pem
            PASS: encoding/xml
            FAIL: exp/cookiejar
            FAIL: exp/ebnf
            FAIL: exp/html
            FAIL: exp/html/atom
            FAIL: exp/locale/collate
            FAIL: exp/locale/collate/build
            FAIL: exp/norm
            FAIL: exp/proxy
            FAIL: exp/terminal
            FAIL: exp/utf8string
            FAIL: html/template
            FAIL: go/ast
            FAIL: go/doc
            FAIL: go/format
            FAIL: go/parser
            FAIL: go/printer
            FAIL: go/scanner
            FAIL: go/token
            FAIL: go/types
            FAIL: hash/adler32
            FAIL: hash/crc32
            FAIL: hash/crc64
            FAIL: hash/fnv
            FAIL: image/color
            FAIL: image/draw
            FAIL: image/jpeg
            FAIL: image/png
            FAIL: index/suffixarray
            FAIL: io/ioutil
            FAIL: log/syslog
            FAIL: math/big
            FAIL: math/cmplx
            FAIL: math/rand
            FAIL: mime/multipart
            FAIL: net/http
            FAIL: net/http/cgi
            FAIL: net/http/fcgi
            FAIL: net/http/httptest
            FAIL: net/http/httputil
            FAIL: net/mail
            FAIL: net/rpc
            FAIL: net/smtp
            FAIL: net/textproto
            FAIL: net/url
            FAIL: net/rpc/jsonrpc
            FAIL: old/netchan
            FAIL: old/regexp
            FAIL: old/template
            FAIL: os/exec
            FAIL: os/signal
            FAIL: os/user
            FAIL: path/filepath
            FAIL: regexp/syntax
            FAIL: runtime/pprof
            FAIL: sync/atomic
            FAIL: text/scanner
            FAIL: text/tabwriter
            FAIL: text/template
            FAIL: text/template/parse
            FAIL: testing/quick
            FAIL: unicode/utf16
            FAIL: unicode/utf8

                    === libgo Summary ===

# of expected passes        1
# of unexpected failures    130
/root/gcc_new/gccbuild/./gcc/gccgo version 4.9.0 20130606 (experimental) (GCC)

Oh boy! Oh boy! Well, on second thoughts, this was not unexpected. This was the core of my GSOC work. This is how it starts :)

Before this goes any further, maybe we should visit the Linux test results too.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Test Run By fotis on Τρι 02 Ιούλ 2013 09:20:20 μμ EEST
Native configuration is i686-pc-linux-gnu

        === libgo tests ===

        Schedule of variations:
            unix

            Running target unix
            Running ../../../gcc/libgo/libgo.exp ...
            PASS: bufio
            PASS: bytes
            ...

                    === libgo Summary ===

# of expected passes        131
/home/fotis/Software/gcc_build/./gcc/gccgo version 4.9.0 20130702 (experimental) (GCC)

Wow. Considering the results from the Hurd, they really are not unexpected. Remember that getcontext, makecontext, setcontext and swapcontext are not working as expected.

And recalling from an email from Ian Lance Taylor (the GCCgo maintainer, and a member of the Go team) early in the summer:

Go does require switching stacks. A port of Go that doesn’t support goroutines would be useless—nothing in the standard library would work

Conclusion / Work for next week.

So now it comes down to work on implementing correctly the context switching functions. Apart from that, going through the test results that fail from gccgo is also something that is to be done, however I am not sure that it should be a first priority. I also have to go through go.log to see if there any clues as to why the gccgo tests fail.

Having finally built gccgo on the Hurd, and more importantly still being on schedule, (the original one, from my proposal) means that I can now concentrate on the core part of my project proposal (and the most exciting one too), that is proper implementation of what is blocking effective context switching, which in its part is blocking goroutines, without which, the go library will not work properly.