Alex Clemmer is a computer programmer. Other programmers love Alex, excitedly describing him as "employed here" and "the boss's son".
Alex is also a Hacker School alum. Surely they do not at all regret admitting him!
(My batch ended on August 22, 2013, but as they say, never graduate.)
After learning a bunch about concurrency primitives Yesterday, I decided it would be fun to have an operational understanding of their implementation. So I decided to boot up dtruss (which is like dtrace, but for OS X) and look at the syscall pattern under different cool concurrent scenarios.
I started by running dtruss on this very simple program that I wrote:
#include <stdio.h>
#include <semaphore.h>
int main()
{
sem_t mutex;
sem_init(&mutex, 0, 1);
sem_post(&mutex);
return 0;
}
… which promptly caused my terminal to explode with a huge chain of syscalls that I did not recognize:
SYSCALL(args) = return
open(".\0", 0x0, 0x1) = 3 0
fstat64(0x3, 0x7FFF6AFE1260, 0x0) = 0 0
fcntl(0x3, 0x32, 0x7FFF6AFE14E0) = 0 0
close(0x3) = 0 0
stat64("/Users/alex/Desktop/fun/scratch/locks\0", 0x7FFF6AFE11D0, 0x0) = 0 0
issetugid(0x7FFF6B01D530, 0x7FFF6AFE1A30, 0x7FFF6B01D530) = 0 0
csops(0x0, 0x0, 0x7FFF6AFE14BC) = 0 0
shared_region_check_np(0x7FFF6AFDF408, 0x2, 0x55) = 0 0
stat64("/usr/lib/dtrace/libdtrace_dyld.dylib\0", 0x7FFF6AFE05D0, 0x7FFF6AFE14C0) = 0 0
open("/usr/lib/dtrace/libdtrace_dyld.dylib\0", 0x0, 0x0) = 3 0
pread(0x3, "\312\376\272\276\0", 0x1000, 0x0) = 4096 0
pread(0x3, "\317\372\355\376\a\0", 0x1000, 0x1000) = 4096 0
mmap(0x10B3E6000, 0x2000, 0x5, 0x12, 0x3, 0x100001F) = 0xB3E6000 0
mmap(0x10B3E8000, 0x1000, 0x3, 0x12, 0x3, 0x100001F) = 0xB3E8000 0
mmap(0x10B3E9000, 0x1F40, 0x1, 0x12, 0x3, 0x100001F) = 0xB3E9000 0
close(0x3) = 0 0
[... AND SO ON ...]
Full printout here. Where’s the initial call to execve? Have I really never run dtrace on OS X before? There were some similarities here (for example fstat is a Linux function too), but overall I had no idea what was going on.
UPDATE: found this neat article that actually has a remarkably similar trace! Wish I found that hours ago.
A lot of this looked to me like dynamic library loading, so I tried to run clang with -static, but I couldn’t quite get it to work. Something about a missing library.
Anyway, a lot of the differences between my prior experiance and this seem to be because OS X is a BSD rather than a Linux. In any event I began looking everything up to see if I could piece together what was happening in kernel land to run this program.
If you don’t know OS X syscalls very well (I also don’t!), you can have a look at the appendix at the bottom of this entry, where I kept my notes as I looked each of them up.
Here is what I think is going on in the syscalls above. A lot of this I’m not quite sure about/not sure how to find out more about. So some of it is pure speculation.
SYSCALL(args) does.open on the current directory. This returns a read-only file with file descriptor 3. (Notice where it says = 3 0? I think the 3 is the file descriptor.)fstat64 and do something to it (e.g., duplicate it or something) with fcntl. Not sure what that something is, because the flag gets turned into an integer here, and I don’t know how to map it back. After that, we close the folder with close.stat64 to find some information about the final of the binary I compiled, which you can see is called locks. After this, we check to see if it has elevated permissions with issetugid, and then use csops to inspect the signature of something at address 0x7FFF6AFE14BC. (I’m not quite sure what’s up here.)shared_region_check_np is doing — I couldn’t find it on google.stat64 to inspect a dynamic library that’s part of libdtrace. Interesting! Would this go away if we weren’t using dtruss? We subsequently open it, again with file descriptor 3. After opening, we can clearly see two calls to pread, which will read 0x1000 bytes! The first one starts at offset 0x0 and the second starts at offset 0x1000, but it looks like they’re pointing at two different places. I wonder why.mmap 3 times to drop file descriptor 3 (i.e., the dynamic library from libdtrace) into memory. I can’t tell what protections or flags they’re being passed in with (since I don’t know what flag, e.g., 0x12 corresponds to), but we are mapping 0x2000, 0x1000, and 0x1f40 bytes in, respectively. To different places in memory. Hmm.After this, we check the status of quite a few library files:
stat64("/usr/lib/libSystem.B.dylib\0", 0x7FFF6AFE0410, 0x7FFF6AFE1290) = 0 0
stat64("/usr/lib/system/libcache.dylib\0", 0x7FFF6AFE0110, 0x7FFF6AFE0F90) = 0 0
stat64("/usr/lib/system/libcommonCrypto.dylib\0", 0x7FFF6AFE0110, 0x7FFF6AFE0F90) = 0 0
stat64("/usr/lib/system/libcompiler_rt.dylib\0", 0x7FFF6AFE0110, 0x7FFF6AFE0F90) = 0 0
stat64("/usr/lib/system/libcopyfile.dylib\0", 0x7FFF6AFE0110, 0x7FFF6AFE0F90) = 0 0
stat64("/usr/lib/system/libdispatch.dylib\0", 0x7FFF6AFE0110, 0x7FFF6AFE0F90) = 0 0
[...]
I guess it’s hard to know why we’d inspect so many of these files. I suspect they’re dynamic libraries, and we’re only loading libdtrace into memory with mmap because we need it. Notice that none of them return error codes, so we’re definitely asking for information about files that exist.
Something that’s a bit more mysterious is the huge number of calls to mmap that follow:
mmap(0x0, 0x2000, 0x3, 0x1002, 0x1000000, 0x4) = 0xB3EB000 0
mprotect(0x10B3EB000, 0x88, 0x1) = 0 0
mmap(0x0, 0x17000, 0x3, 0x1002, 0x1000000, 0x6) = 0xB3ED000 0
mprotect(0x10B3ED000, 0x1000, 0x0) = 0 0
[...]
Notice the file descriptor is 0x1000000 — that seems very high, and I don’t know what it’s for.
But, all this comes to a head when, at the final two lines:
sem_init(0x7FFF6AFE1908, 0x0, 0x1) = -1 Err#78
sem_post(0x7FFF6AFE1908, 0x0, 0xFFFFFFFFFFFFFFFF) = -1 Err#9
We learn that actually I programmed these wrong, they error out, and nothing much happens.
Ah, oh well.
Oddly enough, when you run this multiple times, things execute in a different order! If you have a look at the other tests in the gist, you’ll see that many of them are quite alike, but occur in different order.
I’ve never really been much of a systems programmer. So I started cracking them open one by one. These are some of my notes. Some of them might be wrong!
open takes a file path and opens that file for reading or writing, depending on the flag given as argument.close closes a file descriptor. After closing, it is open for reuse again.fstat64 and stat64 are functions that give us information (permission, inode number, etc.) about some file. In the case of stat, the file is denoted by a path; in the case of fstat, that file is denoted by a file descriptor.fcntl performs an operation (e.g., duplication) on a file denoted by a file descriptor.issetugid helps library routines like those in glibc determine whether a process has raised privileges, which might be an issue if, e.g., they accidentally load libraries off the path that might abuse these privileges.csops is used by system demons to verify the code sign – the signatures that ensures certain important utilities are not tampered with.pread reads from a file descriptor starting at some offset.mmap maps files into memory. In other words, the file is opened and the byte locations of that file are mapped directly into the virtual address space of the running process. An argument determines where to start the map; if NULL, the kernel determines this.Another curious note: a lot of string literals in these calls have a \0 at the end. For example: open(".\0", 0x0, 0x1). I think this is because they’re all null-terminated C strings. So, this syscall is basically opening with no flags and (I think) read-mode.