Distro Forking 101: How Do You Fork a Linux Distro?

Defining the GNU/Linux distribution

If you are here, we can safely assume that you already know what a GNU/Linux software distribution is, but for completion’s sake, let’s just define so we all have the same context.

A GNU/Linux distribution is a collection of system and application software, packaged together by the distribution’s developers, so that they are distributed in a nicely integrated bundle, ready to be used by users and developers alike. Software typically included in such a distribution ranges from a compiler toolchain, to the C library, to filesystem utilities to text editors.

As you can imagine, from the existence of several different GNU/Linux distributions, there are multiple ways that you could possibly combine all these different applications and their respective configurations, not to mention that you could include even more specialised software, depending on the target audience of the distribution (such as multimedia software for a distribution like Ubuntu Studio or penetration testing tools for a distribution such as Kali Linux)

The “f” word

But even with such a great number of different software collections and their respective configurations there still may not be one that appeals to your specific needs. That’s ok though, as you can still customize each and every distribution to your specific liking. Extensive customization is known to create a differentiation point known as a potential forking point.

Forking is a term that has been known to carry negative connotations. As wikipedia puts it,

the term often implies not merely a development branch, but a split in the developer community a form of schism.

Historically, it has also been used as a leverage to coerce a project’s developers into merging code into their master branches that they didn’t originally want to, or otherwise take a decision that they wouldn’t have taken if not under the pressure of a “fork”. But why is it so?

You see, traditionally, forking a project meant a couple of things: For starters, there were now two, identical projects, competing in the same solution space. Those two projects had different development hours and features or bug fixes going into them, and eventually, one of the two ended up being obsolete. Apart from that forking also created an atmosphere of intense competition among the two projects.

However, in 2014, and the advent of the distributed version control systems such as git and mercurial and of the social coding websites such as Github and Bitbucket, the term is finally taking on a more lax meaning, as just another code repository that may or may not enjoy major (or even minor, for that matter) development.

Forking a GNU/Linux distribution

So, up until now we have discussed what a GNU/Linux distribution is, and what a fork is. However, we haven’t discussed yet what it means to fork a GNU/Linux distribution.

You see, what differentiates each distro from the other ones, apart from the software collection that they contain, is the way in which they provide (and deploy) that software. Yes, we are talking about software packages and their respective package managers. Distributions from the Debian (.deb) family are using dpkg along with apt or synaptic or aptitude or some other higher level tool. RPM (.rpm) based distributions may use rpm with yum or dnf or zypper or another higher level tool. Other distributions, not based on the aforementioned may choose to roll their own configuration of packages and package managers, with Arch Linux using its own pacman, Sabayon uses its entropy package manager, etc.

Now, naturally, if you want to customize an application to your liking, you have many ways in which you could do that. One of them is downloading the tarball from the upstream’s website or ftp server, ./configure it and then make and make install it. But if you do start customizing lots of applications this way, it can become tedious and unwieldy too soon. After all, what did that make install install exactly? Will the new update replace those files? What were your configuration options? Did they replace the files the package manager installed?

In this case, it really pays off to learn packaging software for your distribution of choice. What this means is to learn the format of packages your distribution’s package manager accepts as well as how you could produce them. This way, instead of the ./configure && make && make install cycle, you just have beautiful software packages, that you can control more tightly, you can update more easily and you can also distribute them to your friends if you so desire. As an added bonus, now the package manager also knows about those files, and you can install, remove or update them much more easily. What’s not to like?

After you have created some custom packages, you may also wish to create a repository to contain them and update straight from that. Congratulations, you have created your custom distribution, and a potential fork. While you are at it, if you really want to fork the distribution, you could as well take the distribution’s base packages, customize them, rebuild them, and then distribute them. Congratulations again, now you have your true GNU/Linux distribution fork.

That seems easy. More specifically?

Yes of course. Let’s take a look at how you might want to go about forking some well known GNU/Linux distribution.

Debian

In Debian, your usual procedure if you wish to customize a package is the below:

Assuming all went fine, you should now have an $package_name.deb file in your current directory ready to be installed with dpkg -i $package_name.deb.

Please take note that the above is not an extensive treatise into debian packaging by any means. If you want to build custom debian packages, here are some links:

Now that you have your custom packages, it’s time to build a repository to contain them. There are many tools you can use to do that, including the official debian package archiving tool known as dak, but if you want a personal repository without too much hassle, it’s better if you use reprepro. I won’t go to full length on that here, but instead I will lead you to a very good guide to do so if you so desire

Fedora

Building packages for fedora is a procedure similar to the debian one. Fedora however is more convenient in one aspect: It allows you to download a DVD image with all the sources in .rpm form ready for you to customize and rebuild to your tastes.

Apart from that, the usual procedure is the following:

Again the above mentioned steps may not be 100% correct. If you want to go down this route, see the following links for more information:

Next up, is the repository creation step.

Again, not definitive, and for more information, take a look at these links:

Arch Linux

Arch Linux, at least in comparison to .deb and .rpm package distribution families is very easy to customize to your liking. That’s to be expected though as Arch Linux is a distribution that sells itself of the customization capabilities it offers to its user.

In essence, if you want to customize a package, the general process is this:

In order to download the official {core | extra | community} packages, you need to run as root abs. This will create a directory tree that contains the files required for building any package in the official repositories.

Next up, you can create a custom local repository with the repo-add tool, and then proceeding with editing /etc/pacman.conf and adding an entry for your repository there. For more information:

To fork or not to fork?

Well, that’s not an easy question to answer. My opinion is that it’s extremely educational to do a soft fork, clone the distribution’s core repository, and for some time maintain your own distribution based on it, that is, update and customize all the repositories. Do that for some months, then go back to using your distribution of choice now that you are enlightened with how it works under the hood. The reason this is very educational is that it will teach you the ins and outs of your distribution, teach you about all the software in it, how it integrates, what its role is. It will teach you packaging which is a tremendously undervalued skill, as you can customize your experience to your liking, and it will make you appreciate the effort going into maintaining the distribution.

As for doing a hard fork, that is creating your own distribution, that you commit to maintaining it for a long time, my opinion is that it’s simply not worth it. Maintaining a distribution, be it by yourself, or with your friends, is a tremendous amount of work, that’s not worth it unless you have other goals you want to achieve by that. If all you want to do is to customize your distribution of choice to your liking, then go ahead, learn packaging for it, customize-package the applications you want, then create your own repo - but always track the upstream. Diverging too much from the upstream is not worth the hassle, as you will end up spending more time maintaining than using the distribution in the end.

tl;dr:

If you want to do a small scale, private fork in order to see what’s under the hood of your Linux distro; by all means go ahead.

If you want to do a large scale, public fork, then take your time to calculate the effort, if it’s worth it, and if you could just help the upstream distribution implement the features you want.

How the Compiler, the Library and the Kernel Work - Part 3

In the last part of this series, we talked about the compiler’s composition, including the assembler and the linker. We showed what happens when the compiler runs, and what’s the output of translation software such as cc1 or as etc. In this final part of the series, we are going to talk about the C library, how our programs interface with it, and how it interfaces with the kernel.

The C Standard Library

The C Standard Library is pretty much a part of every UNIX like operating system. It’s basically a collection of code, including functions, macros, type definitions etc, in order to provide facilities such as string handling (string.h), mathematical computations (math.h), input and output (stdio.h), etc.

GNU/Linux operating systems are generally using the GNU C Library implementation(GLIBC), but it’s common to find other C libraries being used (especially in embedded systems) such as uClibC, newlib, or in the case of Android/Linux systems Bionic. BSD style operating systems usually have their own implementation of a C library.

So, how does one “use” the C Standard Library?

So, now that we are acquainted with the C Library, how do you make use of it, you ask? The answer is: automagically :). Hold on right there; that’s not exactly a hyperbole. You see, when you write a basic C program, you usually #include <some_header.h> and then continue with using the code declared in that header. We have explained in the previous part of this series that when we use a function, say printf(), in reality it’s the linker that does the hard work and allows us to use this function, by linking our program against the libc’s so (shared object). So in essence, when you need to use the C Standard Library, you just #include headers that belong to it, and the linker will resolve the references to the code included.

Apart from the functions that are defined in the Standards however, a C Library might also implement further functionality. For example, the Standards don’t say anything about networking. As a matter of fact, most libraries today may implement not only what’s in the C Standards, but may also choose to comply with the requirements of the POSIX C library, which is a superset of the C Standard library.

Ok, and how does the C Library manage to provide these services?

The answer to this question is simple: Some of the services that the library provides, it does so without needing any sort of special privileges, being normal, userspace C code, while others need to ask the Operating’s system Kernel to provide these facilities for the library.

How does it do so? By calling some functions exported by the kernel to provide certain functionality named system calls. System calls are the fundamental interface between a userspace application and the Operating System Kernel. For example consider this:

You might have a program that has code like this at one point: fd = open("log.txt", "w+");. That open function is provided by the C Library, but the C Library itself can not execute all of the functionality that’s required to open a file, so it may call a sys_open() system call that will ask the kernel to do what’s required to load the file. In this case we say that the library’s open call acts as a wrapper function of the system call.

Epilogue

In this final part of our series, we saw how our applications interface with the C Standard Library available in our system, and how the Library itself interfaces with the Operating system kernel to provide the required services needed by the userspace applications.

Further Reading:

If you want to take a look at the System Call interface in the Linux Operating System, you could always see the man page for the Linux system calls

Introduction to Xv6: Adding a New System Call.

xv6: An introduction

If you are like me, a low level pc programmer, it’s hard not to have heard of xv6. xv6, for those who haven’t really heard of it, is a UNIX version 6 clone, designed at MIT to help teach operating systems.

The reasoning behind doing this was fairly simple: Up until that point, MIT had used John Lions’ famous commentary on the Sixth Edition of UNIX. But V6 was challenging due to a number of reasons. To begin with, it was written in a near ancient version of C (pre K&R), and apart from that, it contained PDP-11 assembly (a legendary machine for us UNIX lovers, but ancient nonetheless), which didn’t really help the students that had to study both PDP-11 and the (more common) x86 architecture to develop another (exokernel) operating system on.

So, to make things much more simpler, professors there decided to roll with a clone of UNIX version 6, that was x86 specific, written in ANSI C and supported multiprocessor machines.

For a student (or a programmer interested in operating systems), xv6 is a unique opportunity to introduce himself to kernel hacking and to the architecture of UNIX like systems. At about 15k lines of code (iirc), including the (primitive) libraries, the userland and the kernel, it’s very easy (or well, at least easier than production scale UNIX like systems) to grok, and it’s also very easy to expand on. It also helps tremendously that xv6 as a whole has magnificent documentation, not only from MIT, but from other universities that have adopted xv6 for use in their operating systems syllabus.

An introduction to Ensidia: my very personal xv6 fork

When I first discovered xv6 I was ecstatic. For the reasons mentioned above I couldn’t lose on the opportunity to fork xv6 and use it as a personal testbed for anything I could feel like exploring or testing out.

As a matter of fact, when I first discovered xv6, I had just finished implementing (the base of) my own UNIX like operating system, named fotix, and the timing of my discovery was great. xv6 had done what I had done, and also implemented most of what I was planning to work on fotix (for example, elf file loading), and it was a solid base for further development. It also had a userland, which fotix at the time didn’t have.

After I forked xv6, I spent some time familiriazing myself with the code. I also cleaned up the source code quite a bit, structuring the code in a BSD like folder structure, instead of having all of the code in the same folder and made various small scale changes.

After that for quite some time, I had left ensidia alone and didn’t touch it much. However, I always felt like I wanted to develop it a bit more and get to play with its code in interesting ways. I was trying to think of a great way to get started with kernel hacking on it, in a simple way, to get more acquainted with the kernel, and found an interesting pdf with interesting project ideas for it. One of them was to add a system call. I figured out that would be an interesting and quick hack, so hey, why not?

Getting started with kernel hacking on xv6: Adding the system call.

The system call I decided to introduce was the suggested one. It was fairly simple sounding too. You have to introduce a new system call that returns the number of total system calls that have taken place so far. So let’s see how I went about implementing it:

An introduction to system calls in xv6

First of all, we should provide some context about what system calls are, how they are used, and how they are implemented in xv6.

A system call is a function that a userspace application will use, so as to ask for a specific service to be provided by the operating system. For instance with an sbrk(n) system call, a process can ask the kernel to grow its heap space by n bytes. Another example is the well known fork() system call in the UNIX world, that’s used to create a new process by cloning the caller process.

The way applications signal the kernel that they need that service is by issueing a software interrupt. An interrupt is a signal generated that notifies the processor that it needs to stop what its currently doing, and handle the interrupt. This mechanism is also used to notify the processor that information it was seeking from the disks is in some buffer, ready to be extracted and processed, or, that a key was pressed in the keyboard. This is called a hardware interrupt.

Before the processor stops to handle the interrupt generated, it needs to save the current state, so that it can resume the execution in this context after the interrupt has been handled.

The code that calls a system call in xv6 looks like this:

initcode.S
# exec(init, argv)
 .globl start
 start:
   pushl $argv
   pushl $init
   pushl $0  // where caller pc would be
   movl $SYS_exec, %eax
   int $T_SYSCALL

In essence, it pushes the argument of the call to the stack, and puts the system call number (in the above code, that’s $SYS_exec) into %eax. The number is used to match the entry in an array that holds pointers to all the system calls. After that, it generates a software interrupt, with a code (in this case $T_SYSCALL) that’s used to index the interrupt descriptor tables and find the appropriate interrupt handler.

The code that is specific to find the appropriate interrupt handler is called trap() and is available in the file trap.c. If trap() check’s out the trapnumber in the generated trapframe (a structure that represents the processor’s state at the time that the trap happened) to be equal to T_SYSCALL, it calls syscall() (the software interrupt handler) that’s available in syscall.c

trap.c
// This is the part of trap that
// calls syscall()
void
trap(struct trapframe *tf)
{
  if(tf->trapno == T_SYSCALL){
    if(proc->killed)
      exit();
    proc->tf = tf;
    syscall();
    if(proc->killed)
      exit();
    return;
  }

syscall() is finally the function that checks out %eax to get the number of the system call (to index the array with the system call pointers), and execute the code corresponding to that system call.

The implementation of system calls in xv6 is under two files. The first one is sysproc.c, and is the one containing the implementation of system calls correspondent to processes, and sysfile.c that contains the implementation of system calls regarding the file system.

The specific implementation of the numcalls() system call

To implement the system call itself is simple. I did so with a global variable in syscall.c called syscallnum, that’s incremented everytime syscall(), calls a system call function, that is, the system call is valid.

Next we just need a function, the system call implementation that returns that number to the userspace program that asks for it. Below is the function itself, and syscall() after our change.

sysproc.c
// return the number of system calls that have taken place in
// the system
int
sys_numcalls(void)
{
    return syscallnum;
}
syscall.c
// The syscall() implementation after
// our change
void
syscall(void)
{
  int num;

  num = proc->tf->eax;
  if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
    syscallnum++; // increment the syscall counter
    proc->tf->eax = syscalls[num]();
  } else {
    cprintf("%d %s: unknown sys call %d\n",
            proc->pid, proc->name, num);
    proc->tf->eax = -1;
  }
}

After that was done, the next few things that were needed to be done were fairly straight forward. We had to add an index number for the new system call in syscall.h, expose it to user proccesses via user.h, and add a new macro to usys.S that defines an asm routine that calls that specific system call, and change the makefile to facilitate our change . After doing so we had to write a userspace testing program to test our changes.

The result after doing all this is below :)

cpu1: starting
cpu0: starting
init: starting sh
$ ls
.              1 1 512
..             1 1 512
README         2 2 2209
cat            2 3 9725
echo           2 4 9254
forktest       2 5 5986
grep           2 6 10873
init           2 7 9579
kill           2 8 9246
ln             2 9 9240
ls             2 10 10832
mkdir          2 11 9315
rm             2 12 9308
sh             2 13 16600
stressfs       2 14 9790
usertests      2 15 37633
wc             2 16 10207
zombie         2 17 9028
syscallnum     2 18 9144
console        3 19 0
$ syscallnum
The total number of syscalls so far is 643
$ syscallnum
The total number of syscalls so far is 705
$ syscallnum
The total number of syscalls so far is 767
$ syscallnum
The total number of syscalls so far is 829

Epilogue

I usually end my blog posts with an epilogue. Although this is a post that doesn’t necesarilly need one, I wanted to write one just to say to you that you should try kernel hacking, that is programming jargon for programming an operating system kernel, because it’s an experience that undoubtedly will teach you a great deal of things about how your computer actually works.

Last but not least, take a look at the ongoing work on Ensidia, my fork of xv6. To see this particular work, take a look at the syscall branch.

References

How the Compiler, the Library and the Kernel Work - Part 2

In the previous part of this little series, we talked about the compiler, and what it does with the header files, in our attempt to demistify their usage. In this part, I want to show you what’s the compiler’s output, and how we create our file.

The compiler’s composition

Generally speaking, a compiler belongs to a family of software called translators. A translator’s job is to read some source code in a source language, and generate (translate it to) some source code in a target language.

Now, you might think that most compilers you know don’t do that. You input a (source code) file, and you get a binary file, ready to run when you want it to. Yes that’s what it does, but it’s not the compiler that does all this. If you remember from the last installment of this series, when you call the compiler like gcc some_file.c or clang some_file.c, in essence you are calling the compilation driver, with the file as a parameter. The compilation driver then calls 1) the preprocessor, 2) the (actual) compiler, 3) the assembler and last but not least the linker. At least when it comes to gcc, these pieces of software are called cpp, cc1, gas (executable name is as) and collect2 (executable name is ld) respectively.

From that little software collection up top, that we call the compiler, we can easily take notice of at least 3 (yeah, that’s right) translators, that act as we mentioned earlier, that is take some input in a source language, and produce some output to a target language.

The first is the preprocessor. The preprocessor accepts source code in C as a source language, and produces source code again in C (as a target language), but with the output having various elements of the source code resolved, such as header file inclusion, macro expansion, etc.

The second is the compiler. The compiler accepts (in our case) C source code, as a source language, and translates it to some architecture’s assembly language. In my case, when I talk about the compiler, I’m gonna assume that it produces x86 assembly.

The last one, is the assembler, which accepts as input some machine’s architecture assembly language, and produces what’s called binary, or object representation of it, that is it translates the assembly mnemonics directly to the bytes they correspond to, in the target architecture.

At this point, one could also argue that the linker is also a translator, accepting binary, and translating it to an executable file, that is, resolving references, and fitting the binary code on the segments of the file that is to be produced. For example, on a typical GNU/Linux system, this phase produces the executable ELF file.

The (actual) compiler’s output: x86 assembly.

Before we go any further, I would like to show you what the compiler really creates:

For the typical hello world program we demonstrated in our first installment, the compiler will output the following assembly code:

hello.S
        .file        "hello.c"
        .section        .rodata
.LC0:
        .string        "Hello world!"
        .text
        .globl        main
        .type        main, @function
main:
        pushq        %rbp
        movq        %rsp, %rbp
        subq        $16, %rsp
        movl        %edi, -4(%rbp)
        movq        %rsi, -16(%rbp)
        movl        $.LC0, %edi
        call        puts
        movl        $0, %eax
        leave
        ret
        .size        main, .-main
        .ident        "GCC: (GNU) 4.8.2 20131212 (Red Hat 4.8.2-7)"
        .section        .note.GNU-stack,"",@progbits

To produce the above file, we had to use the following gcc invocation command: gcc -S -fno-asynchronous-unwind-tables -o hello.S hello.c. We used -fno-asynchronous-unwind-tables to remove .cfi directives, which tell gas (the gnu assembler) to emit Dwarf Call Frame Information tags, which are used to reconstruct a stack backtrace when a frame pointer is missing.

For more usefull compilation flags, to control the intermediary compilation flow, try these:

The default behaviour is to use none, and stop after the linker has run. If you want to run a full compilation and keep all the intermediate files, use the -save-temps flag.

From source to binary: the assembler.

The next part of the compilation process, is the assembler. We have already discussed what the assembler does, so here we are going to see it in practice. If you have followed so far, you should have two files, a hello.c, which is the hello world’s C source code file, and a hello.S which is what we created earlier, the compiler’s (x86) assembly output.

The assembler operates on that last file as you can imagine, and to see it running, and emit binary, we need to invoke it like this: as -o hello.bin hello.S, and produces this:

ELF\00\00\00\00\00\00\00\00\00\00>\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\F0\00\00\00\00\00\00\00\00\00\00\00@\00\00\00\00\00@\00\00\00UH\89\E5H\83\EC\89}\FCH\89u\F0\BF\00\00\00\00\E8\00\00\00\00\B8\00\00\00\00\C9\C3Hello world!\00\00GCC: (GNU) 4.8.2 20131212 (Red Hat 4.8.2-7)\00\00.symtab\00.strtab\00.shstrtab\00.rela.text\00.data\00.bss\00.rodata\00.comment\00.note.GNU-stack\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00 \00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00@\00\00\00\00\00\00\00 \00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\B8\00\00\00\00\00\000\00\00\00\00\00\00\00	\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00&\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00`\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00,\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00`\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\001\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00`\00\00\00\00\00\00\00
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\009\00\00\00\00\00\000\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00m\00\00\00\00\00\00\00-\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00B\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\9A\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\9A\00\00\00\00\00\00\00R\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\B0\00\00\00\00\00\00\F0\00\00\00\00\00\00\00
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00	\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\A0\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\F1\FF\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00	\00\00\00\00\00\00\00\00\00\00\00\00\00 \00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00hello.c\00main\00puts\00\00\00\00\00\00\00\00\00\00\00\00\00
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00	\00\00\00\FC\FF\FF\FF\FF\FF\FF\FF

Last but not least: the linker

We saw what the assembler emits, which is to say, binary code. However, that binary code still needs further processing. To explain that, we need to go back a little.

In our first installment of the series, we said that when you call a function like printf(), the compiler only needs its prototype to do type checking and ensure that you use it legally. For that you include the header file stdio.h. But since that contains the function prototype only, where is the source code for that function? Surely, it must be somewhere, since it executes successfully to begin with, but we haven’t met the source code for printf so far, so where is it?

The function’s source code is located in the .so (shared object) of the standard C library, which in my system (Fedora 19, x64) is libc-2.17.so. I don’t want to expand on that further, as I plan to do so on the next series installment, however, what we have said so far is enough for you to understand the linker’s usage:

The linker resolves the undefined (thus far) reference to printf, by finding the reference to the printf symbol and (in layman’s talk) making a pointer to point to it so that execution can jump to printf’s code when we have to do that during our program’s execution.

To invoke the linker on our file, at least according to it’s documentation, we should do the following: ld -o hello.out /lib/crt0.o hello.bin -lc. Then we should be able to run the file like this: ./hello.out.

Epilogue

That’s this end of this part 2 of my series that explains how your code turns into binary, and how your computer (at least when it comes to the software side) runs it. In part 3, I am going to discuss in greater length, the C library, and the kernel.

References

My Linux From Scratch Experience

The past two to three days, I have been busy with creating my very own Linux distribution using the well known Linux from Scratch. This post is an accounting of my experience with the process, what I liked, what I did learn from that, what was surprising to me and more.

Linux from Scratch: An introduction

If you are here, then you most likely already know what linux from scratch is, but for the sake of completeness (or in the case that you don’t know what it is, but are so keen on learning) I will provide an introduction about it here.

Linux from scratch is a book (from now on, lfs), providing a series of steps that guide you to the creation of a fully function GNU/Linux distribution. Although the original book creates a “barebones” distribution, with only fundamental tools in it, the distribution created provides a fine enviroment for further experimentation or customization.

Apart from the basic book, the lfs project also has 3-4 books to read if you want to extend the basic system (such as blfs, Beyond Linux from Scratch) or if you want to automate the process, create a distribution that is more secure, or how to cross-compile an lfs system for different machines.

My experience with building LFS

A small introduction about my background

I have been a UNIX (-like) systems (full-time) user for about 2.5 years now. During that time I had seen myself from being what you would call a Linux newbie, not knowing how to use a system without a GUI installed (have I mentioned that Ubuntu was my favourite distribution) to being an arguably experienced UNIX programmer, trying to learn more about the various aspects of UNIX systems, and delving deeper and deeper into them every day (while also feeling pain if using something other than a UNIX like system).

During that time, I have learned about the Unix way of working with the system, using the shell and the system’s toolchain to write software and other wise manipulate the system. I ditched my old knowledge about IDEs and GUIs, and set out to master the command line and the associated tools (Anecdote: I remember, when I first came from to Unix from Windows, to searching the net for a C/C++ IDE to do development.) I remember reading about how people worked another way in Unix land, using an editor, and the shell to work, and I decided to force myself to learn to work that way. I still remember trying to use vim and gcc, and ending up liking this way better because it seemed a more natural way to interact with the software development process, than using a ide and pressing the equivalent of a “play” button, so that magic ensues for the next few seconds until I have a result.

Time has passed since then, and going through hours and hours of reading and working with the system, I did learn quite a lot about it. My Google Summer of Code experience in 2013 expanded my system knowledge even further (that’s what you get when you have to work with the system kernel, the C library and a compiler).

But in all that time, of using Unix like systems, I never had the chance to create one myself. And although my background did allow me to know quite a few things of the inner workings of a system like that, I never actually saw all these software systems combining in front of my very eyes to create that beauty we know as a GNU/Linux distribution. And that left me a bad taste, because I knew what was happening, but I wanted to see it happen right in front of my eyes.

Knowing about the existence of lfs, and not actually going through it also made matters worse for me, as I knew that I could actually “patch” that knowledge gap of mine, but I never really tried to do that. I felt that I was missing on a lot, and that lfs would be instrumental to my understanding of a Linux system. Having gone through that some years ago, and getting stuck at the very beginning had also created an innate fear in me, that it was something that would be above my own powers.

Until two days ago, when I said to myself: “You know what? I have seen and done a lot of things in a UNIX system. I am now much more experienced than I was when I last did it. And I know I want to at least try it, even if it will only give me nothing but infinite confusion Because if I do manage to get it, I will learn so many more things, or at least get assured that my preexisting knowledge was correct” And that thought was the greatest motive I had to do that in a fairly long time.

So, I sat at my desk, grabbed a cup of coffee and off I went!

The process

Preparation and the temporary toolchain

The book is itself several chapters long, each of which perform another “big step” in the creation of the distribution.

The first few chapters are preparatory chapters, where you ensure the integrity of the building environment, and download any building dependencies you may be lacking, create a new partition that will host the lfs system, and create the user account that will do the building of the temporary toolchain.

The temporary toolchain building is a more exciting process. In essence you compile and collect several pieces of software that will later be used to compile the distribution’s toolchain and other software.

You start of with building binutils, and that is to get a working assembler and linker. After having a working assembler and linker, you proceed with compiling gcc. Next on is unpacking the linux headers, so that you can compile (and link against them) the glibc.

Having the basic parts of the toolchain compiled, you then proceed with installing other software that is needed in the temporary toolchain, like gawk, file, patch, perl etc.

Building the main system

After getting done with the temporary toolchain, you then chroot into the lfs partition. You start of with creating the needed directories (like /bin, /boot, /etc, /home etc) and then continue with building the distribution software, utilising the temporary toolchain. For instance, you construct a new gcc, you compile sed, grep, bzip, the shadow utility that manages the handling of passwords etc, all while making sure that things don’t break, and running countless tests (that sometimes take longer than what the package took to compile) to ensure that what you build is functional and reliable.

Final configuration

Next one on the list, is the various configuration files that reside in /etc, and the setup of sysvinit, the distribution’s init system.

Last, but not least, you are compiling the linux kernel and setting up grub so that the system is bootable.

At this point, if all has gone well, and you reset, you should boot into your new lfs system.

What did I gain from that?

Building lfs was a very time consuming process for me. It must have taken about 7-8 hours at the very least. Not so much because of the compilation and testing (I was compiling with MAKEFLAGS='-j 4' on a Core i5), but because I was didn’t complete some steps correctly, and later needed to go back and redo them, along with everything that followed, along with the time it took to research some issues, programs or various other things before I did issue a command at the shell.

Now if I were to answer the question “What did I gain from that”, my answer would be along the lines of “Infinite confusion, and some great insight at some points”.

To elaborate on that,

Epilogue

Lfs was, for the most part, a great experience. As a knowledge expander, it works great. As a system that you keep and continue to maintain? I don’t know. I know that people have done that in the past, but I decided against maintaining my build, as I figured it would be very time consuming, and that if I ever wanted to gain the experience of maintaining a distro, I would probably fork something like Crux.

In the end if you ask me if I can recommend that to you, I will say that I’m not so sure. It will provide you with some insight into the internals of a GNU/Linux distribution, but it won’t make you a better programmer as some people claim (most of the process revolves around the configure && make && make install cycle, and some conf files handwriting).

In the end, it is yourself who you should ask. Do you want that knowledge? Is it worth the hassle for you? Do you want the bragging rights? Are you crazy enough to want to maintain it? These are all questions that you get as many answers to them as the people you ask.

How the Compiler, the Library and the Kernel Work - Part 1

Before we get any further, it might be good if we provided some context.

Hello world. Again.

helloworld.c
#include <stdio.h>

int
main (int argc, char **argv)
{
    printf ("Hello world!\n");

    return 0;
}

Every user space (read: application) programmer, has written a hello world program. Only god knows how many times this program has been written. Yet, most programmers’ knowledge of the program is limited to something along the lines of:

  • It sends the string passed as a parameter to the system to print.
  • It takes the printf function from stdio.h and prints the string

and various other things, which are anywhere between plain wrong, or partially correct.

** So why not demistify the process? **

Enter the C preprocessor.

You may have heard of the C Preprocessor. It’s the first stage of a c or c++ file compilation, and it’s actually responsible for things such as:

 if (gt (5, 3)) printf ("The first parameter is greater than the second.\n");

gt (5, 3) gets expanded to the macro definition, so after the preprocessor has run you end up with something like this:

 if (((5 > 3) ? 1 : 0)) printf ("The first parameter is greater than the second.\n");
#ifdef WIN32 
    printf ("We are on windows\n"); 
#endif

amongst others. You can see it for yourself. Write the hello world program, and pass it to cpp: cpp hello_world.c

So now that we know what it does it’s time to demistify a common myth regarding it: Some people believe that the header files include the function to be called.. That’s wrong. What it does include is function prototypes (and some type definitions, etc) only. It doesn’t include the body of the function to be called.

Some people find that fact quite surprising, though, it isn’t, if you get to understand what the compiler does with it.

Say hello to the compiler.

Here we are gonna unmask another pile of misconceptions. First of all, some people think that when they call gcc on the command line they are actually calling the compiler. They are not. In fact they are calling the software commonly called the compilation driver, whose job is to run all the software needed to fully turn source to binary, including preprocessors, the actual compiler, an assembler and finally the linker

Having said that, the actual compiler that’s getting called when you call gcc is called cc1. You may have seen it some times when the driver reports errors. Wanna take a look at it, to make sure I’m not lying to you? (Hint: I’m not!) Fair enough. Why don’t you type this in the command line: gcc -print-prog-name=cc1. It should tell you where the actual compiler is located in your system.

So now that we have this (misconception) out of our minds, we can continue with our analysis. Last time we talked about it, we said that the header files include prototypes and not the whole function.

You may know that in C, you usually declare a function, before you use it. The primary reason for doing this is to provide the compiler with the ability to perform type checking, that is to check that the arguments passed are correct, both in number, and in type, and to verify that the returned value (assuming there is one) is being used correctly. Below is a program that demonstrates the function prototype:

prototype.c
#include <stdio.h>

int add_nums (int first, int second);

int
main (void)
{
    printf ("5 + 5 results in %d\n", add_nums (5, 5));

    return 0;
}

int
add_nums (int first, int second)
{
    return first + second;
}

In this particular example, the prototype gives the compiler a wide variety of information. It tells it that function add_nums takes two int arguments and returns an integer to the calling function. Now the compiler can verify that I am passing correct arguments to it when I call it inside printf. If I don’t include the function prototype, and do something slightly evil such as calling add_nums with float arguments then this might happen:

5 + 4 results in 2054324224

Now that you know that the compiler (the real one) only needs the prototype and not the actual function code, you may be wondering how the compiler actually compiles it if it doesn’t know it’s code.

Now is the time to bring down another missconception. The word compiler is just a fancy name for software otherwise known as translators. A translator’s job is to get input and turn it from one language (source language) to a second language (target language), whatever that may be. Most of the times, when you compile software, you compile it to run in your computer, which runs on a processor from the x86 architecture family of processors. A processor is typically associated with an assembly language for that architecture (which is just human friendly mnemonics for common processor tasks), so your x86 computer runs x86 assembly (ok that’s not 100% true, but for simplicity’s sake at the moment, it should serve. We will see why it’s not true later.) So the compiler (in a typical translation) translates (compiles) your C source code to x86 assembly. You can see this by compiling your hello world example and passing the compiler the -S (which asks it to stop, after x86 assembly is produced) parameter, likewise gcc -S hello.c.

Conclusion

At this part, we saw how the compiler and the preprocessor work with our code, in an attempt to demistify the so called library calls. In the next part, we are going to study the assembler and the linker, and for the final part the loader and the kernel.

GSOC Week 11 Report

Introduction

This week was spent investigating the runtime and debugging executables with gdb. It was interesting in the sense that it provided me with some interesting pieces of information. Without any further ado, let’s present our findings:

My findings

Before starting out playing with libpthread, and glibc, I wanted to make sure that the goruntime behaved the way I believed it behaved, and make some further assurances about the goruntime. These assurances had to do with the total number of goroutines and the total number of machine threads at various checkpoints in the language runtime.

There is only one small piece of code in the goruntime that creates some sort of confusion for me, and that is the code for a new m initialisation. Let me first present the code that confuses me:

gcc/libgo/runtime/proc.c

M*
runtime_newm(void)
{

    ...
        mp = runtime_mal(sizeof *mp);

    ...
        mcommoninit(mp);
        mp->g0 = runtime_malg(-1, nil, nil);

    ...
        if(pthread_attr_init(&attr) != 0)
                runtime_throw("pthread_attr_init");
        if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
                runtime_throw("pthread_attr_setdetachstate");

    ...
}

I purposely compacted the function for brevity, as it only serves as a demonstration for a point. Now, my confusion lies in the line mp->g0 = runtime_malg(-1, nil, nil). It is a piece of code that allocates memory for a new goroutine. Now I am ok with that, but what I do not understand is that new kernel threads (m’s) are supposed to be pick and run a goroutine from the global goroutine pool - that is run an existing one, and not create a new one. Now, the runtime_malg is given parameters that don’t initialise a new goroutine properly, but still, new memory is allocated for a new goroutine, and is returned to mp->g0 from runtime_malg.

Assuming I have not misunderstood something, and I am not mistaken (which is kind of likely), this is behavior that could lead to a number of questions and/or problems. For instance, what happens to the goroutine created by runtime_malg? Is it killed after the m is assigned a new goroutine to execute? Is it parked on the goroutine global list? Is it just ignored? Does it affect the runtime scheduler’s goroutine count? This is the last thing I feel I wanna clear out regarding gccgo’s runtime.

gdb

For this week, I also run the executables created by gccgo through gdb. It was a fertile attempt that, most of the time, confirmed my findings in the goruntime. It also provided us with some other nice pieces of information regarding the crashing of goroutines, but also left me with a question.

The code in question that I run through gdb is this:

goroutine.go
package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    fmt.Println("[!!] right before a go statement")
    go say("world")
    say ("hello")
}

Your very typical hello world like goroutine program. Now, setting a break point in main (not the program’s main, that’s main.main. main as far as the runtime is concerned is the runtime entry point, in go-main.c) and running it through gdb yields the following results:

Breakpoint 1, main () at ../../../gcc_source/libgo/runtime/go-main.c:52
52 runtime_check ();
2:  __pthread_total = 1
1: runtime_sched.mcount = 0
(gdb) next
53 runtime_args (argc, (byte **) argv);
2: __pthread_total = 1
1: runtime_sched.mcount = 0
54 runtime_osinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 0
63: runtime_schedinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 1

Up until now, nothing unexpected. The kernel thread is registered with the runtime scheduler during its initialisation process in runtime_schedinit and that’ why the runtime_sched.mcount is reported to be zero many times before schedinit is run.

68 __go_go (mainstart, NULL);
2: __pthread_total = 1
1: runtime_sched.mcount = 1
(gdb) display runtime_sched.gcount
3: runtime_sched.gcount = 0

That too is ok, because a new goroutine is registered with the scheduler during the call to __go_go. Now I am gonna fast forward a bit, to a more interesting point.

...
[DEBUG] (in runtime_gogo) new goroutine's status is 2
[DEBUG] (in runtime_gogo) number of goroutines now is 2
[New Thread 629.30]

Program received SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) info threads
 Id   Target  Id       Frame
 6    Thread  629.30   0x08048eb7 in main.main () at goroutine.go:12
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
<div class='bogus-wrapper'><notextile><figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>This is getting weird. I mean, libpthread is reporting that 2 threads are active,
</span><span class='line'>but gdb reports that 3 are active. Anyway, let's continue:
</span></code></pre></td></tr></table></div></figure></notextile></div>
[DEBUG] (in runtime_stoptheworld) stopped the garbage collector
[DEBUG] (in runtime_starttheworld) starting the garbage collector
[DEBUG] (in runtime_starttheworld) number of m's now is: 2
[DEBUG] (in runtime_starttheworld) [note] there is already one gc thread
[!!] right before a go statement

Program received signal SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) continue
... (output omitted by me for brevity)

[DEBUG] (in runtime_newm) Right before the call to pthread_create.
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
[New Thread 629.31]

Program received signal SIGABRT, Aborted.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3

Oh my goodness. From a first glance, this seems to be a very serious inconsistency between libpthread and the goruntime. At this point, the go scheduler reports 3 threads (3 registered threads, that means that flow of execution has passed mcommoninit, the kernel thread initialisation function which also registers the kernel thread with the runtime_scheduler) whereas libpthread reports 2 threads.

But WAIT! Where are you going? Things are about to get even more interesting!

(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

GDB reports 4 threads. Yes, 4 threads ladies and gentlemen. Now take a look closely. 3 threads are in the same frame, with the one with id 4 being the one currently executed. And there is also a pattern. 0x01da48ec is the value of the eip register for all 3 of them.

That’s one thing that is for certain. Now I already have an idea. Why not change the current thread to the one with id 7? I’m sold to the idea, let’s do this:

(gdb) thread 7
[Switching to thread 7 (Thread 629.31)]
#0  0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
(gdb) continue
Continuing.

Program received signal SIGABRT, Aborted.
[Switching to Thread 629.28]
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3
(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01dc08b0 in ?? () from /lib/i386-gnu/libc.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

Damn. But I am curious. What’s the next value to be executed?

(gdb) x/i $eip
=> 0x1da48ec: ret

And what is the next value to be executed for the thread with id 7?

(gdb) x/i $eip
=> 0x1dc08b0: call *%edx

Conclusion

Apparently, there is still much debugging left to checkout what is really happening. But we have got some leads in the right direction, that hopefully will lead us to finally finding out where the problem lies, and correct it.

Most importantly, in my immediate plans, before iI start playing around with libpthread is to attempt the same debugging run on the same code, under linux (x86). Seeing as go is clean on linux, it would provide some clues as to what the expected results should be, and where the execution differentiates substantially, a clue that might be vital to finding the problem.

GSOC Week 10 Report

Introduction

This week was spent attempting to debug the gccgo runtime via print statements. There were many things that I gained from this endeavour. The most significant of which, is the fact that I have got a great deal of information regarding the bootstrapping of a go process. Let’s proceed into presenting this week’s findings, shall we?

Findings

The process bootstrapping sequence

The code that begins a new go-process is conveniently located in a file called go-main.c, the most significant part of which is the following:

go-main.c
int
main (int argc, char **argv)
{
  runtime_check ();
  runtime_args (argc, (byte **) argv);
  runtime_osinit ();
  runtime_schedinit ();
  __go_go (mainstart, NULL);
  runtime_mstart (runtime_m ());
  abort ();
}

static void
mainstart (void *arg __attribute__ ((unused)))
{
  runtime_main ();
}

The process is as follows:

The very last piece of code that is run (and most probably the most important) is runtime_main. Remember that this is passed as a parameter to a goroutine created during the __go_go call, and its job is to mark the goroutine that called it as the main os thread, to initialise the sceduler, and create a goroutine whose job is to release unused memory (from the heap) back to the OS. It then starts executing the process user defined instructions (the code the programmer run) via a call to a macro that directs it to __go_init_main in the assembly generated by the compiler.

Runtime_main is also the function that terminates the execution of a go process, with a call to runtime_exit which seems to be a macro to the exit function.

Other findings

During our debugging sessions we found out that the total count of kernel threads that are running in a simple program is at least two. The first one is the bootstrap M, (the one initialised during the program’s initialisation, inside runtime_schedinit) and at least another one, (I am still invistigating the validity of the following claim) created to be used by the garbage collector.

A simple go program such as one doing arithmetic or printing a helloworld like message evidently has no issue running. The issues arrise when we use a go statement. With all our debugging messages activated, this is how a simple go program flows:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] (in main) before runtime_mcheck is run
[DEBUG] (in main) before runtime_args is run
[DEBUG] (in main) before runtime_osinit is run
[DEBUG] (in main) before runtime_schedinit is run
[DEBUG] (in main) before runtime_mstart is run
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (in mainstart) right before the call to runtime_main
[DEBUG] (in runtime_main) Beginning of runtime_main
[DEBUG] (start of runtime_newm) Total number of m's is 1
[DEBUG] (in runtime_newm) Preparing to create a new thread
[DEBUG] (in runtime_newm) Right before the call to pthread_create
[DEBUG] (in runtime_newm) pthread_create returned 0
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (end of runtime_newm) Total number of m's is 2
Hello, fotis
[DEBUG] (in runtime_main) Right before runtime_exit

And this is how a goroutine powered program fails:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] (in main) before runtime_mcheck is run
[DEBUG] (in main) before runtime_args is run
[DEBUG] (in main) before runtime_osinit is run
[DEBUG] (in main) before runtime_schedinit is run
[DEBUG] (in main) before runtime_mstart is run
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (in mainstart) right before the call to runtime_main
[DEBUG] (in runtime_main) Beginning of runtime_main
[DEBUG] (start of runtime_newm) Total number of m's is 1
[DEBUG] (in runtime_newm) Preparing to create a new thread
[DEBUG] (in runtime_newm) Right before the call to pthread_create
[DEBUG] (in runtime_newm) pthread_create returned 0
[DEBUG] (in runtime_mstart) right before the call to runtime_minit
[DEBUG] (end of runtime_newm) Total number of m's is 2
[DEBUG] (start of runtime_new) Total number of m's is 2
[DEBUG] (in runtime_newm) Preparing to create a new thread.
[DEBUG] (in runtime_newm) Right before the call to pthread_create
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

Work for the next week

I will of course continue to print debug until I have knowledge of the exact flow of execution in the go runtime. Right now I have very good knowledge of the flow, but there are some things that I need to sort out. For instance it is not exactly clear to me why we call certain functions, or what they are supposed to be doing at certain parts. After I sort this out, I also plan to start debugging the libpthread to see what’s libpthreads status during a hello world like program, and during a goroutine powered program, to get to see if we get to find something interesting in libpthread (like how many threads does libpthread report against how many the goruntime reports)

GSOC Week 9 (Partial) Report

This week was revolving around the print debugging in the gccgo runtime in search for clues regarding the creation of new threads under the goruntime, so as to see if there is something wrong with the runtime itself, or the way the runtime interacts with the libpthread.

(partial presentation of) findings

During print debugging the gccgo runtime, I didn’t notice anything abnormal or unusual so far. For example, the code that does trigger the assertion failure seems to work at least once, since pthread_create() returns 0 at least once.

This is expected behavior, since we already have stated that there is at least one M (kernel thread) created at the initialisation of the program’s runtime.

If however, we try to use a go statement in our program, to make usage of a goroutine, the runtime still fails at the usual assertion fail, however the output of the program is this:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] pthread_create returned 0
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

The above output can give us some pieces of information:

The second bullet point is also being supported by the fact that even if you exe cute something as simple as hello world in go, a new M is created, so you get something along the lines of this as an output:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] pthread_create returned 0
Hello World!
root@debian:~/Software/Experiments/go#

There is however something that the above piece of code doesn’t tell us, but it would be useful to know: How many times did we create a new thread? So we modify our gcc’s source code to see how many times the runtimes attempts to create a new kernel thread (M). This is what we get out of it:

root@debian:~/Software/Experiments/go# ./a.out
[DEBUG] Preparing to create a new thread.
[DEBUG] pthread_create returned 0
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
[DEBUG] Preparing to create a new thread.
aborted.

The code at this point in the runtime is this:

// Create a new m.  It will start off with a call to runtime_mstart.
M*
runtime_newm(void)
{
    M *mp;
    pthread_attr_t attr;
    pthread_t tid;
    size_t stacksize;
    sigset_t clear;
    sigset_t old;
    int ret;

#if 0
    static const Type *mtype;  // The Go type M
    if(mtype == nil) {
        Eface e;
        runtime_gc_m_ptr(&e);
        mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
    }
#endif

    // XXX: Added by fotis for print debugging.
    printf("[DEBUG] Preparing to create a new thread.\n")

    mp = runtime_mal(sizeof *mp);
    mcommoninit(mp);
    mp->g0 = runtime_malg(-1, nil, nil);

    if(pthread_attr_init(&attr) != 0)
        runtime_throw("pthread_attr_init");
    if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
        runtime_throw("pthread_attr_setdetachstate");

    // <http://www.gnu.org/software/hurd/open_issues/libpthread_set_stack_size.html>
#ifdef __GNU__
    stacksize = StackMin;
#else
    stacksize = PTHREAD_STACK_MIN;

    // With glibc before version 2.16 the static TLS size is taken
    // out of the stack size, and we get an error or a crash if
    // there is not enough stack space left.  Add it back in if we
    // can, in case the program uses a lot of TLS space.  FIXME:
    // This can be disabled in glibc 2.16 and later, if the bug is
    // indeed fixed then.
    stacksize += tlssize;
#endif

    if(pthread_attr_setstacksize(&attr, stacksize) != 0)
        runtime_throw("pthread_attr_setstacksize");

    // Block signals during pthread_create so that the new thread
    // starts with signals disabled.  It will enable them in minit.
    sigfillset(&clear);

#ifdef SIGTRAP
    // Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
    sigdelset(&clear, SIGTRAP);
#endif

    sigemptyset(&old);
    sigprocmask(SIG_BLOCK, &clear, &old);
    ret = pthread_create(&tid, &attr, runtime_mstart, mp);

    /* XXX: added for debug printing */
    printf("[DEBUG] pthread_create() returned %d\n", ret);

    sigprocmask(SIG_SETMASK, &old, nil);

    if (ret != 0)
        runtime_throw("pthread_create");

    return mp;
}

We can deduce two things about our situation right now:

Continuation of work.

I have been following this course of path the last week. I presented some of my findings, and hope to soon be able to write an exhaustive report on what exactly it is that causes the bug.

GSOC Week 8 (Partial) Report

This week was spent studying the go language’s runtime and studying the behaviour of various go programs when executed under the Hurd. I learnt a variety of new things, and got some new clues about the problem.

The new libgo clues

I already know that M’s are the “real” kernel schedulable threads and G’s are the go runtime managed ones (goroutines). Last time I had gone through the go runtime’s code I had noticed that neither of them get created, so there must be an issue with thread creation. But since there is at least one of each created during the program’s initialization, how come most programs are able to run, and issues present themselves when we manually attempt to run a goroutine?

I will admit that the situation looks strange. So I decided to look more into it. Before we go any further, I have to embed the issues I had when I run goroutine powered programs under the Hurd.

root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

__pthread_create_internal is a libpthread function that gets called when a new posix thread is instanciated. So we know that when we call a goroutine, apart from the goroutine, there is at least one kernel thread created, otherwise, if a new goroutine was created, and not a new kernel thread (M) why wasn’t it matched with an existing kernel thread (remember there is at least one).

That made me look into the go runtime some more. I found a lot of things, that I can not enumerate here, but amongst the most interesting ones, was the following piece of code:

proc.c

// Create a new m.  It will start off with a call to runtime_mstart.
M*
runtime_newm(void)
{
        M *mp;
        pthread_attr_t attr;
        pthread_t tid;
        size_t stacksize;
        sigset_t clear;
        sigset_t old;
        int ret;

#if 0
        static const Type *mtype;  // The Go type M
        if(mtype == nil) {
                Eface e;
                runtime_gc_m_ptr(&e);
                mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
        }
#endif

        mp = runtime_mal(sizeof *mp);
        mcommoninit(mp);
        mp->g0 = runtime_malg(-1, nil, nil);

        if(pthread_attr_init(&attr) != 0)
                runtime_throw("pthread_attr_init");
        if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
                runtime_throw("pthread_attr_setdetachstate");

        stacksize = PTHREAD_STACK_MIN;

        // With glibc before version 2.16 the static TLS size is taken
        // out of the stack size, and we get an error or a crash if
        // there is not enough stack space left.  Add it back in if we
        // can, in case the program uses a lot of TLS space.  FIXME:
        // This can be disabled in glibc 2.16 and later, if the bug is
        // indeed fixed then.
        stacksize += tlssize;

        if(pthread_attr_setstacksize(&attr, stacksize) != 0)
                runtime_throw("pthread_attr_setstacksize");

        // Block signals during pthread_create so that the new thread
        // starts with signals disabled.  It will enable them in minit.
        sigfillset(&clear);

#ifdef SIGTRAP
        // Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
        sigdelset(&clear, SIGTRAP);
#endif

        sigemptyset(&old);
        sigprocmask(SIG_BLOCK, &clear, &old);
        ret = pthread_create(&tid, &attr, runtime_mstart, mp);
        sigprocmask(SIG_SETMASK, &old, nil);

        if (ret != 0)
                runtime_throw("pthread_create");

        return mp;
}

This is the code that creates a new kernel thread. Notice the line ret = pthread_create(&tid, &attr, runtime_mstart, mp);. It’s obvious that it creates a new kernel thread, so that explains why we get the specific error. But what is not explained is that since we do have at least one in program startup, why is this specific error only triggered when we manually create a go routine?

Go programs under the Hurd

Apart from studying Go’s runtime source code, I also run some experiments under the Hurd. I got some very weird results that I am investigating, but I would like to share nonetheless. Consider the following piece of code:

hellogoroutines.go
package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    say("world")
    say("hello")
}

A very basic example that can demonstrate goroutines. Now, if we change one of the say functions inside main to a goroutine, this happens:

root@debian:~/Software/Experiments/go# ./a.out
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
Aborted

BUT if we change BOTH of these functions to goroutines (go say("world"), go say("hello")), this happens:

root@debian:~/Software/Experiments/go# ./a.out
root@debian:~/Software/Experiments/go# 

Wait a minute. It can’t be! Did it execute correctly? Where is the output?

root@debian:~/Software/Experiments/go# echo $?
0
root@debian:~/Software/Experiments/go#

It reports that it has executed correctly. But there is no output.

What I am doing next

I will continue reading through the go runtime for some clues. On the more active size, I am writing a custom test case for goroutine testing under the Hurd, while also doing some analysis on the programs that run there (currently studying the assembly generated for these programs) to see how they differ and why we get this particular behavior.