System.Map Demystified

Mon Jun 13 08:18:46 IST 2005

Morning Friends,

Few years ago while conducting training programs in GNU/Linux
Administration, I found 'system.map' to be one of those mystery files on
GNU/Linux. Not being pretentious about my GNU/Linux skills, I was
searching for a good explanation ever since.

Yesterday I stumbled upon this fine article by Peter.
http://www.dirac.org/linux/system.map/

Perhaps you will find it interesting.

###### Article Start #########
System.map

There seems to be a dearth of information about the System.map file.
It's really nothing mysterious, and in the scheme of things, it's really
not that important. But a lack of documentation makes it shady. It's
like an earlobe; we all have one, but nobody really knows why. This is a
little web page I cooked up that explains the why.

Note, I'm not out to be 100% correct. For instance, it's possible for a
system to not have /proc filesystem support, but most systems do. I'm
going to assume you "go with the flow" and have a fairly typical system.

Some of the stuff on oopses comes from Alessandro Rubini's "Linux Device
Drivers" which is where I learned most of what I know about kernel
programming.

What Are Symbols?
        In the context of programming, a symbol is the building block of
        a program: it is a variable name or a function name. It should
        be of no surprise that the kernel has symbols, just like the
        programs you write. The difference is, of course, that the
        kernel is a very complicated piece of coding and has many, many
        global symbols.

What Is The Kernel Symbol Table?
        The kernel doesn't use symbol names. It's much happier knowing a
        variable or function name by the variable or function's address.
        Rather than using size_t BytesRead(), the kernel prefers to
        refer to this variable as (for example) c0343f20.

        Humans, on the other hand, do not appreciate addresses like
        c0343f20. We prefer to use something like size_t BytesRead().
        Normally, this doesn't present much of a problem. The kernel is
        mainly written in C, so the compiler/linker allows us to use
        symbol names when we code and allows the kernel to use addresses
        when it runs. Everyone is happy.

        There are situations, however, where we need to know the address
        of a symbol (or the symbol for an address). This is done by a
        symbol table, and is very similar to how gdb can give you the
        function name from an address (or an address from a function
        name). A symbol table is a listing of all symbols along with
        their address. Here is an example of a symbol table:

           c03441a0 B dmi_broken
           c03441a4 B is_sony_vaio_laptop
           c03441c0 b dmi_ident
           c0344200 b pci_bios_present
           c0344204 b pirq_table
           c0344208 b pirq_router
           c034420c b pirq_router_dev
           c0344220 b ascii_buffer
           c0344224 b ascii_buf_bytes

        You can see that the variable named dmi_broken is at the kernel
        address c03441a0.

What Is The System.map File?
        There are 2 files that are used as a symbol table:

             1. /proc/ksyms 
             2. System.map 

        There. You now know what the System.map file is.

        Every time you compile a new kernel, the addresses of various
        symbol names are bound to change.

        /proc/ksyms is a "proc file" and is created on the fly when a
        kernel boots up. Actually, it's not really a file; it's simply a
        representation of kernel data which is given the illusion of
        being a disk file. If you don't believe me, try finding the
        filesize of /proc/ksyms. Therefore, it will always be correct
        for the kernel that is currently running.

        However, System.map is an actual file on your filesystem. When
        you compile a new kernel, your old System.map has wrong symbol
        information. A new System.map is generated with each kernel
        compile and you need to replace the old copy with your new copy.

What Is An Oops?
        What is the most common bug in your homebrewed programs? The
        segfault. Good ol' signal 11.

        What is the most common bug in the Linux kernel? The segfault.
        Except here, the notion of a segfault is much more complicated
        and can be, as you can imagine, much more serious. When the
        kernel dereferences an invalid pointer, it's not called a
        segfault -- it's called an "oops". An oops indicates a kernel
        bug and should always be reported and fixed.

        Note that an oops is not the same thing as a segfault. Your
        program cannot recover from a segfault. The kernel doesn't
        necessarily have to be in an unstable state when an oops occurs.
        The Linux kernel is very robust; the oops may just kill the
        current process and leave the rest of the kernel in a good,
        solid state.

        An oops is not a kernel panic. In a panic, the kernel cannot
        continue; the system grinds to a halt and must be restarted. An
        oops may cause a panic if a vital part of the system is
        destroyed. An oops in a device driver, for example, will almost
        never cause a panic.

        When an oops occurs, the system will print out information that
        is relevent to debugging the problem, like the contents of all
        the CPU registers, and the location of page descriptor tables.
        In particular, the contents of the EIP (instruction pointer) is
        printed. Like this:

           EIP: 0010:[<00000000>]
           Call Trace: [<c010b860>]

What Does An Oops Have To Do With System.map?
        You can agree that the information given in EIP and Call Trace
        is not very informative. But more importantly, it's really not
        informative to a kernel developer either. Since a symbol doesn't
        have a fixed address, c010b860 can point anywhere.

        To help us understand cryptic oops output, Linux uses a daemon
        called klogd, the kernel logging daemon. klogd intercepts kernel
        oopses and logs them with syslogd, changing some of the useless
        information like c010b860 with information that humans can use.
        In other words, klogd is a kernel message logger which can
        perform name-address resolution. Once klogd tranforms the kernel
        message, it uses whatever logger is in place to log system wide
        messages, usually syslogd.

        To perform name-address resolution, klogd uses System.map. Now
        you know what an oops has to do with System.map.

        There's other software besides the kernel logger daemon that
        uses System.map. I'll get into that shortly.

        Fine print:
        There are actually two types of address resolutions performed by
        klogd.

              * Static translation, which uses the System.map file. 
              * Dynamic translation, which is used with loadable
                modules. These translations don't use System.map and is
                therefore not relevant to this discussion, but I'll
                describe it briefly anyhow: 
        Klogd Dynamic Translation
        Suppose you load a kernel module which generates an oops. An
        oops message is generated, and klogd intercepts it. It is found
        that the oops occured at d00cf810. Since this address belongs to
        a dynamically loaded module, it has no entry in the System.map
        file. klogd will search for it, find nothing, and conclude that
        a loadable module must have generated the oops. klogd then
        queries the kernel for symbols that were exported by loadable
        modules. Even if the module author didn't export his symbols, at
        the very least, klogd will know what module generated the oops,
        which is better than knowing nothing about the oops at all.

Where Should System.map Be Located?
        System.map should be located wherever the software that uses it
        looks for it. That being said, let me talk about where klogd
        looks for it. Upon bootup, if klogd isn't given the location of
        System.map as an argument, it will look for System.map in three
        places, in the following order:

             1. /boot/System.map 
             2. /System.map 
             3. /usr/src/linux/System.map 

        System.map also has versioning information, and klogd
        intelligently searches for the correct map file. For instance,
        suppose you're running kernel 2.4.18 and the associated map file
        is /boot/System.map. You now compile a new kernel 2.5.1 in the
        tree /usr/src/linux. During the compiling process, the
        file /usr/src/linux/System.map is created. When you boot your
        new kernel, klogd will first look at /boot/System.map, determine
        it's not the correct map file for the booting kernel, then look
        at /usr/src/linux/System.map, determine that it is the correct
        map file for the booting kernel and start reading the symbols.

        A few nota bene's:

              * Somewhere during the 2.5.x series, the Linux kernel
                started to untar into linux-version, rather than just
                linux (show of hands -- how many people have been
                waiting for this to happen?). I don't know if klogd has
                been modified to search
                in /usr/src/linux-version/System.map yet. TODO: Look at
                the klogd source. If someone beats me to it, please
                email me and let me know if klogd has been modified to
                look in the new directory name for the linux source
                code. 
              * The man page doesn't tell the whole the story. Look at
                this: 
                      # strace -f /sbin/klogd | grep 'System.map'
                      31208 open("/boot/System.map-2.4.18", O_RDONLY|O_LARGEFILE) = 2

                Apparently, not only does klogd look for the correct
                version of the map in the 3 klogd search directories,
                but klogd also knows to look for the name "System.map"
                followed by "-kernelversion", like System.map-2.4.18.
                This is undocumented feature of klogd.

        A few drivers need System.map to resolve symbols since they're
        linked against kernel headers instead of glibc). They won't work
        correctly without the System.map for the particular kernel
        currently running. This is NOT the same thing as a module not
        loading because of a kernel version mismatch. That has to do
        with the kernel version, not the kernel symbol table which
        changes between kernels of the same version!

What else uses the System.map
        System.map isn't just useful for debugging kernel oopses. Other
        programs like lsof:

           satan# strace lsof 2>&1 1> /dev/null | grep System
           readlink("/proc/22711/fd/4", "/boot/System.map-2.4.18", 4095) = 23

        and ps:

           satan# strace ps 2>&1 1> /dev/null | grep System
           open("/boot/System.map-2.4.18", O_RDONLY|O_NONBLOCK|O_NOCTTY) = 6

        and dosemu require a correct System.map.

What Happens If I Don't Have A Healthy System.map?
        Suppose you have multiple kernels on the same machine. You need
        a separate System.map file for each kernel! If you run a kernel
        with no (or an incorrect) System.map, you'll periodically see a
        message like:

                System.map does not match actual kernel 

        Not a fatal error, but can be annoying to see everytime you use
        ps. Some software, like dosemu, may not work correctly. Lastly,
        your klogd or ksymoops output will not be reliable in case of a
        kernel oops.

How Do I Remedy The Above Situation?
        The solution is to keep all your System.map files in /boot and
        rename them with the kernel version. Suppose you have multiple
        kernels like:

              * /boot/vmlinuz-2.2.14 
              * /boot/vmlinuz-2.2.13 

        Then just rename your map files according to the kernel version
        and put them in /boot, like:

              * /boot/System.map-2.2.14 
              * /boot/System.map-2.2.13 

        Now what if you have two copies of the same kernel? Suppose you
        have two copies of 2.2.14. One compiled with sound, and the
        other without sound:

              * /boot/vmlinuz-2.2.14 
              * /boot/vmlinuz-2.2.14.nosound 

        The best answer would be if all software looked for the
        following files:

              * /boot/System.map-2.2.14 
              * /boot/System.map-2.2.14.nosound 

        But to be honest, I don't know if this is the best situation.
        Everything I've seen searches for "System.map-version" but what
        about "System.map-version.extraversion"? I have no idea (TODO).
        ########Article Ends ########

-- 
arky

Rakesh 'arky' Ambati
GPG Key ID:  0x92BCF7D4 
Blog [ http://arky.in ]
Member FSUG-Bangalore [ http://bangalore.gnu.org.in ]
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://gnowledge.org/pipermail/fsug-bangalore/attachments/20050613/d67c8f3f/attachment.pgp