Lesson 15: Debugging your kernel and modules with gdb

Printer-friendly versionPrinter-friendly version

What do you need first?

Before you get into this lesson, let's make sure you have what you need to make everything here work properly. Mostly, you need to be running under a new kernel that's been configured and built with the configuration options CONFIG_PROC_KCORE and CONFIG_DEBUG_INFO, as well as having the raw, uncompressed vmlinux kernel image file that was generated when you built your current kernel. If you followed the recipe in the previous lesson, you definitely should have all of that -- the vmlinux file should still be sitting at the top of the kernel source tree you used for the build. Do not go any further until you've verified all of the above. I mean it.

Oh, and you'll need the debugger as well:

$ sudo apt-get install gdb

Finally, let's give credit where credit is due -- a good deal of this lesson is based on Chapter 4 of "Linux Device Drivers" (3rd ed).

So what exactly are we about to do?

What we're going to do is use what is normally a userspace debugger (gdb) to peek into the address space of a running kernel. But if you already have some gdb experience, note carefully that you're going to have some definite limitations. Mostly, you'll be using gdb only to examine the contents of kernel space -- you won't have the ability to do things like set breakpoints or single-step through kernel code. But for a lot of cases, simply being able to view the data in kernel space in real-time is enough.

Getting ready ...

Developers who have used gdb in user space probably recall the general usage:

$ gdb [executable-file] [core dump image]

Coincidentally, if you've set everything up properly, you have exactly the same information at hand for kernel space -- your raw, uncompressed vmlinux file is the executable file, while the proc file /proc/kcore acts as your (kernel space) core file. So if we copy your vmlinux file to /tmp for brevity, your debugging incantation for a running kernel is simply:

$ sudo gdb /tmp/vmlinux /proc/kcore

Note that you need to have root privilege for this given the permissions on the /proc/kcore file. And on that note, let's do a test debugging session.

ASIDE: 32-bit versus 64-bit debugging

A while back, I wrote a similar tutorial on gdb debugging for kernel space, and it was on a 32-bit system, and everything worked just fine. This, however, is the first time I've done this on a 64-bit system and there are definitely some subtle issues, and I'm not sure if it's just me or if there's a fundamental difference, but keep in mind that what follows is being done on my 64-bit Ubuntu system, so if you get different results on your 32-bit system, I may not be able to explain it.

And on that note, let's get into our first example.

How many loops per jiffy? Let's find out.

Let's pick on a specific variable in kernel space, and see what it takes to display its value in the running kernel. From the kernel source file init/main.c, we have the variable definition:

unsigned long loops_per_jiffy = (1<<12);

So what does that tell us? It tells us three things:

  • that variable is of type "unsigned long",
  • it has a defined value of "1<<12" or 4096, and
  • it's been exported so it's available to the rest of kernel space and to loadable modules.
    • And knowing all that, we can dig around to learn what we can about it.

      First, you should recall from an earlier lesson that you can see the entire kernel address space in the file /proc/kallsyms, so what can we learn about that variable?

      $ grep loops_per_jiffy /proc/kallsyms
      ffffffff817fee80 r __ksymtab_loops_per_jiffy
      ffffffff81813e10 r __kcrctab_loops_per_jiffy
      ffffffff8181e5e5 r __kstrtab_loops_per_jiffy
      ffffffff81a34450 D loops_per_jiffy
      ffffffff81bc1608 b loops_per_jiffy_ref

      And, finally, if our gdb session is running properly, we can:

      $ sudo gdb /tmp/vmlinux /proc/kcore
      ... snip ...
      (gdb) whatis loops_per_jiffy
      type = long unsigned int
      (gdb) p loops_per_jiffy
      $1 = 4096

      which certainly seems to match what we know about that variable. And there you have it -- how to dump a kernel variable via gdb. Yes, it's really that easy. But, of course, there's much more.

      The kernel symbol types

      Note from the above that that particular variable has a type of "D", which inspires the question -- what does that mean? Simple:

      $ man nm
      ... snip ...
      If lowercase, the symbol is local; if uppercase, the symbol is global (external).
      ... snip ...
      "b" The symbol is in the uninitialized data section (known as BSS).
      ... snip ...
      "d" The symbol is in the initialized data section.
      ... and so on and so on ...

      So this tells us (correctly) that that variable is in the initialized data section, and that it's global; that is, it's been exported -- all very useful information you can glean from looking at the contents of /proc/kallsyms.

      It also tells us that we can even examine kernel data objects that haven't been exported, which means that even when our loadable modules don't have access to a variable or object, we can still examine them with gdb -- a very handy property.

      Exercise for the student: This isn't going to be a gdb tutorial, so it's your responsibility to start reading the gdb docs. As a start, you can:

      (gdb) help
      List of classes of commands:
      aliases -- Aliases of other commands
      breakpoints -- Making program stop at certain points
      data -- Examining data
      files -- Specifying and examining files
      internals -- Maintenance commands
      obscure -- Obscure features
      running -- Running the program
      stack -- Examining the stack
      status -- Status inquiries
      support -- Support facilities
      tracepoints -- Tracing of program execution without stopping the program
      user-defined -- User-defined commands
      Type "help" followed by a class name for a list of commands in that class.
      Type "help all" for the list of all commands.
      Type "help" followed by command name for full documentation.
      Type "apropos word" to search for commands related to "word".
      Command name abbreviations are allowed if unambiguous.
      (gdb) help data
      ... lots of snip here, you get the idea ...

      If you're feeling ambitious, suggest some other kernel variables that you think would be handy to display in your running kernel.

      Dumping more complicated structures

      As long as you compiled the running kernel with CONFIG_DEBUG_INFO, you have the ability to dump the contents of some fairly complicated structures. Fo example, from the header file include/linux/init_task.h, we have a macro that defines a sizable task_struct structure:

      #define INIT_TASK(tsk)  \
      {                                                 \
              .state          = 0,                      \
              .stack          = &init_thread_info,      \
              .usage          = ATOMIC_INIT(2),         \
              .flags          = PF_KTHREAD,             \
              .lock_depth     = -1,                     \
               ... and on and on ...

      From the source file arch/x86/kernel/init_task.c, we have the definition of init_task:

       * Initial task structure.
       * All other task structs will be allocated on slabs in fork.c
      struct task_struct init_task = INIT_TASK(init_task);

      From /proc/kallsyms, we have:

      $ grep init_task /proc/kallsyms
      ffffffff810c9980 T ftrace_graph_init_task
      ffffffff810ecb30 T perf_event_init_task
      ffffffff817fee60 r __ksymtab_init_task
      ffffffff81813e00 r __kcrctab_init_task
      ffffffff8181e5d0 r __kstrtab_init_task
      ffffffff81a32020 D init_task          <-- there it is
      ffffffff81bcdec0 B init_task_group

      And, finally, from our debugging session, we can examine the contents of that structure instance with:

      (gdb) whatis init_task
      type = struct task_struct
      (gdb) p init_task
      $1 = {state = 0, stack = 0xffffffff81a00000, usage = {counter = 2}, flags = 2097152, 
        ptrace = 0, lock_depth = -1, prio = 120, static_prio = 120, normal_prio = 120, 
        rt_priority = 0, sched_class = 0x0, se = {load = {weight = 0, inv_weight = 0}, 
          run_node = {rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0}, group_node = {
      ...and on and on ...

      In short, we can dump amazingly complicated kernel data structures this way.

      Dumping kernel data that is constantly changing

      Unsurprisingly, the whole point of dumping data from kernel space is that you want to see the value of that data in real time. But gdb doesn't work that way.

      Instead, for efficiency, gdb caches the data from the core file at start time, so if you tried to print, say, the value of the constantly-changing __jiffies variable, you'd see:

      (gdb) p __jiffies
      $1 = 4294937296
      (gdb) p __jiffies
      $2 = 4294937296
      (gdb) p __jiffies
      $3 = 4294937296
      (gdb) p __jiffies
      $4 = 4294937296

      In order to refresh, you need to reload the core file with:

      (gdb) core-file /proc/kcore

      but here's where it gets confusing because, if I do that on my 64-bit system, I see no change. This worked fine on an earlier 32-bit system, so I have to admit I'm baffled as to what's happening here. If I figure it out, I'll let you know.

      Debugging your loadable modules

      And this is the part we've been working up to -- how to use gdb to similarly debug your loadable (and loaded) modules. Consider the following sample module crash_gdb.c:

      #include <linux/module.h>
      #include <linux/init.h>
      #include <linux/kernel.h>
      static int rpjday_1;
      int rpjday_2 = 20;
      int rpjday_3 = 30;
      static int __init gdb_hi(void)
          printk(KERN_INFO "Module crash_gdb being loaded.\n");
          return 0;
      static void __exit gdb_bye(void)
          printk(KERN_INFO "Module crash_gdb being unloaded.\n");
      MODULE_DESCRIPTION("Module debugging with gdb."); 

      As you can see, I've defined a small set of variables with different visibilities, so you can tell which objects gdb has access to. Create an appropriate Makefile and compile that module, but don't load it just yet.

      Verify that none of those symbols are in the kernel symbol table:

      $grep rpjday /proc/kallsyms

      OK, nothing there. Now start a debugging session:

      $ gdb /tmp/vlinux /proc/kcore

      and this is where the fun starts. Load your module, and verify that the appropriate data objects are now in kernel space:

      $ grep rpjday /proc/kallsyms
      ffffffffa007c090 r __ksymtab_rpjday_3	[crash_gdb]
      ffffffffa007c0a8 r __kstrtab_rpjday_3	[crash_gdb]
      ffffffffa007c0a0 r __kcrctab_rpjday_3	[crash_gdb]
      ffffffffa007c0b4 d rpjday_2	[crash_gdb]
      ffffffffa007c0b8 D rpjday_3	[crash_gdb]

      Exercise for the student: Explain why you don't see rpjday_1.

      More coming shortly ...

      Apparently, there are some issues with debugging and 64-bit systems, so I'm putting the rest of this lesson on hold and I'll come back to it when I've resolved the issues.


Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <br> <pre> <h1> <h2> <h3> <h4>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.