CS010 Practice 7

Debugging with gdb

Printing

There are several print commands available on Unix. The one I generally use is enscript. Use it as follows:

enscript -r -2 -Ec foo.c

-r tells it to print the output in landscape format. -2 says to use 2 columns. -Ec says to use the formatting rules for C. This will put keywords in bold face. Then give the file(s) you want to print. It will format them and send them to the printer.

If you want to print in portrait format, leave off the -r. If you want one column, leave off the -2. If you're printing something other than a C file, leave off the -Ec.

There are lots of command line options (the things after the command name and before the file to print) for enscript. Look at the xman page for enscript if you're curious.

Debugging Strategies

Finding the bugs in a program is a skill that you need to learn if you are going to become a successful programmer. Strategies that have been available to you thus far have been:

These are all good strategies to use, but as programs become larger and more complex, these strategies are not always sufficient. Also, they are not always sufficient to understand program behavior when C code misbehaves by indexing arrays out of bounds, derferencing dangling pointers, etc. For these types of situations, it is better to be able to control the execution of your program more carefully and interactively examine addresses and variable values. To do that, you need a debugger.

The gdb Debugger

For this tutorial, you should copy the files in /home/faculty/freund/shared/cs010/practice7 to your directory. You will be practice using the debugger on them

gdb is the debugger that is most commonly used to debug C programs on Unix machines. To use a debugger, you start the debugger and then run the program inside the debugger. You can stop execution at any line of code, display the value of any expression, execute the code one line at a time, and many other things. In order to use the debugger, you must compile your program with an extra option -g:

gcc -Wall -g -o foo foo.c

This option adds information to your executable file that allows the debugger to know where variables are stored so that when you ask the debugger what value a variable has, it will know what address the variable is stored at. This information is not normally present in executable programs as it makes them bigger and has no value unless you are using a debugger.

To start the debugger you use the gdb command and tell it the name of the executable program you want to debug. If you have a core file that was generated by a crash of the program, you then say core and it will load the core file so that you have the state of the program at the time of the crash. This is extremely useful.

gdb match match.core

When gdb starts with a core file, it shows you what line of code caused the program to crash. You can also look at the values that variables had at the time of the crash.

A gdb Session

To demonstrate the debugger, I will show you a program that has a bug and show you how to use the debugger to find the bug. Here's the program we will work with:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
   
int main () {
  char *s = malloc (strlen ("Williams") + 1);
  strcpy (s, "Williams");
  free (s);
  s = NULL;
  printf ("%c\n", s[0]);
  return EXIT_SUCCESS;
}   

This is the crash.c program you copied from my directory.

When I run this program, I get a segmentation fault. (Do you know why?) Suppose I don't know why from looking at the code so I start the debugger (the program is called bug). I am going to run gdb within Emacs. After starting Emacs, I type "M-x gdb". The message area at the bottom of Emacs says:

Run gdb (like this): gdb 

It leaves the cursor at the end of gdb. I type in the name of the executable (assume I called in crash) and the name of the core file (in this case crash.core) so that it now appears as follows:

Run gdb (like this): gdb crash crash.core

Emacs creates a new buffer with the following contents:

Current directory is .
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
Core was generated by `crash'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libc.so.4...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0  0x804860c in main () at crash.c:10
(gdb)   

The first few lines are just legalese that appears whenever you start gdb. Then we see a statement indicating which executable created the core file. The next line tells us what caused the program to crash, in this case segmentation fault. We already knew this because we got a similar message from Unix when the program crashed. Next it tells us what libraries it is reading symbols from. You can safely ignore those lines. Finally it tells us where the program crashed. #0 indicates that the function listed contains the line of code that failed. Next is the address of the instruction that crashed. You can ignore this. The remainder is important. It tells us the function that crashed, which file it is in, and which line number, in this case line 10. (gdb) is the prompt that is now waiting for user input.

The first thing we should do is find out what line we crashed. The last line of output tells us which line and we should also see another buffer in our Emacs window that contains the source code of crash.c with line 10 beginning with "=>":

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
   
int main () {
  char *s = malloc (strlen ("Williams") + 1);
  strcpy (s, "Williams");
  free (s);
  s = NULL;
=>printf ("%c\n", s[0]);
  return EXIT_SUCCESS;
}   

The arrow is pointing to line 10. Since we died with a segmentation fault, we expect to see that we tried to dereference a bad pointer. There is only one variable on line 10, namely s. It is being used as an array, but we know that arrays and pointers are often interchangeable. In fact, it was declared to be a pointer. We should now be suspicious that s is null. We can confirm this by asking the debugger what value s has using the p (for print) command:

(gdb) p s

gdb responds with:

$1 = 0x0

$1 simply means this is the result of our first debugging command. The right hand side of the = sign gives the actual value, 0x0. Whenever we ask for an address it will print out a strange looking number begining with 0x. The "number" might also include the letters a, b, c, d, and e. It is actually outputtin a hexadecimal number, that is, a number written in base 16. Don't let this bother you. The only things you'll generally want to do with address values are to determine if it is 0 (which is simple) or to determine if two variables hold the same address.

Our suspicions have been confirmed. 0x0 is the memory address 0, or null. Now we need to figure out why it had that value. In this case, that is not difficulty since the immediately preceding line assigned it NULL!

Examining the Call Stack

Many times when you start the debugger, you may find that the program crashed due to a null pointer, but it might not be immediately obvious why the pointer is null or how to fix the problem. The error in your code might be in a different function than where the segmentation fault occurs. In that case, you need to determine which function called the function that crashed and the values of variables in that function. This requires you to examine the "call stack" of functions at the time of the crash. The call stack is the list of functions that are currently active along with the current lines in each function. So if function f1 calls function f2, f2 is at the top of the stack and f1 is the next element in the stack. f1's current line is the line where it called f2. Here is a minor variation of the first program that crashes inside a function so we need to examine the call stack to see what happened. It is located in file crash2.c

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
   
void myfunc (char *s);
   
int main () {
  char *s = malloc (strlen ("Williams") + 1);
  strcpy (s, "Williams");
  free (s);
  s = NULL;
  myfunc (s);
  return EXIT_SUCCESS;
}
   
void myfunc (char *s) {
  printf ("%c\n", s[0]);
}
   

The initial output that we get from gdb is the same in this case, except for the line that tells us where we are in the program. This time it says:

#0  0x804862c in myfunc (s=0x0) at crash2.c:17

If we look at the source code, this tells us that myfunc dies in a printf when it is trying to access the 0th element of the array. Our hypothesis should be the pointer passed in as the address of the array is null. Now, we need to discover where it was called. For this program, it is trivial because the program is small and myfunc is only called once. In general, however, a function may be called from many places. To find out where it was called, we ask to see the call stack using the bt command:

(gdb) bt
   

The output we get is:

#0  0x804862c in myfunc (s=0x0) at crash2.c:17
#1  0x8048612 in main () at crash2.c:12
#2  0x8048521 in _start ()

This means that myfunc was called by from main. main was called from _start, a function that C uses to get the program started. myfunc was called in main at line 12. To see the values of variables in main, we issue the up command:

(gdb) up
#1  0x8048612 in main () at crash2.c:12

Now, we can look at the code and variables in main as we did earlier and again discover that s is null. This time the crash didn't occur until we were inside myfunc because we did not actually attempt to dereference s. We just passed it as a parameter to myfunc. The first attempt to use the pointer was inside myfunc so that is where the crash happened.

Executing Programs within gdb

In the previous examples we could easily determine what went wrong just by looking at the core file. That is not always the case. In particular, to track down problems with dangling pointers, we often need to execute the program stopping at every line looking for an unexpected change to a variable. There are several commands in gdb that help us do that. Consider the following program (fillbuffer.c):

#include <stdio.h>
   
char *fillBuffer();
   
int main () {
  char *buffer;
  char *buffer2;
   
  buffer = fillBuffer();
  printf ("buffer = %s\n", buffer);
   
  buffer2 = fillBuffer();
  printf ("buffer = %s\n", buffer);
  printf ("buffer2 = %s\n", buffer2);
  exit (0);
}
   
char *fillBuffer () {
  char line[1000];
   
  printf ("Enter a line: ");
  gets (line);
  return line;
}
   

When we run this program, we get the following result:

-> fillbuffer
Enter a line: abc
buffer = abc
Enter a line: def
buffer = def
buffer2 = def

buffer is initially set correctly, but at the end of the program both variables have the same value! There is no core file because the program did not crash. It just did not do what we wanted. Assuming that we cannot figure this out from looking at the source code, we start the debugger:

-> gdb fillbuffer

This time we do not pass a core file on the command line. When gdb starts up, we only see the legalese. What I want to do is step through the main program to find out where the value of buffer changes. First I set a breakpoint at the beginning of main:

(gdb) b main
Breakpoint 1 at 0x804851e: file fillbuffer.c, line 10.

Now I can run the program and the debugger will stop when it gets to the beginning of main:

Starting program: fillbuffer
   
Breakpoint 1, main () at fillbuffer.c:10

This tells you which program it is running. When it reaches the breakpoint, it outputs a message indicating where it is and a prompt. It also updates the other Emacs buffer to show me the line of code at which it is stopped. Now, I want to execute my program one line at a time until after I assign the value to buffer. Then after each statement execution, I will print the value of buffer to find out where it changes. Note that when gdb points to a statement from my program, it is the statement it is going to execute next. Here we go:

(gdb) n
warning: this program uses gets(), which is unsafe.
Enter a line: abc
(gdb) p buffer
$1 = 0xbfbff2b0 "abc"
(gdb) n
buffer = abc
(gdb) p buffer
$2 = 0xbfbff2b0 "abc\001"
(gdb) n
Enter a line: def
(gdb) p buffer
$3 = 0xbfbff2b0 "def"
(gdb)
   

The value of buffer changed somewhat after the first call to printf, but not to the bad value we saw when outside the debugger. When we continue running, we then see that the second call to fillBuffer modifies the value returned by the first call! What happened? It appears that something minor went wrong on the first printf and somethint more major went wrong on the second call to fillBuffer. Suppose I still don't understand the problem. I will run it again inside the debugger only this time I will singlestep through the second call to fillBuffer to find out where buffer changes value. (I can't singlestep through printf because that is a library function and I don't have the source code for it.) To do this, I will use the s command to step into fillBuffer. I will use n to step through fillBuffer. After each line of code I will go up to the main program and print buffer:

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
   
Starting program: fillbuffer
   
Breakpoint 1, main () at fillbuffer.c:10
(gdb) n
warning: this program uses gets(), which is unsafe.
Enter a line: abc
(gdb) n
buffer = abc
(gdb) s
fillBuffer () at fillbuffer.c:22
(gdb) up
#1  0x8048541 in main () at fillbuffer.c:13
(gdb) p buffer
$6 = 0xbfbff2b0 "abc\001"
(gdb) down
#0  fillBuffer () at fillbuffer.c:22
(gdb) n
(gdb) up
#1  0x8048541 in main () at fillbuffer.c:13
(gdb) p buffer
$7 = 0xbfbff2b0 "abc\001"
(gdb) down
#0  fillBuffer () at fillbuffer.c:23
(gdb) n
Enter a line: def
(gdb) up
#1  0x8048541 in main () at fillbuffer.c:13
(gdb) p buffer
$8 = 0xbfbff2b0 "def"
(gdb)
   
   

This means that the call to gets modified buffer! The only way this could happen is if buffer and line are using the same memory. When we print buffer, it tells us what address it is using. Its address is 0xbfbff2b0. When working with strings, gdb displays the address and then the string value. Now, let's see what line is using:

(gdb) down
#0  fillBuffer () at fillbuffer.c:24
(gdb) p &line
$9 = (char (*)[1000]) 0xbfbff2b0
   

To print the address of line, I need to use the & address operator. This first shows me the type of line and then its address. In fact, buffer and line are using the same memory! How did that happen? We did something we should not have done. We passed a local array variable as a return value from a function. C automatically freed that memory on the return but we kept the address in buffer. On the next call, C happened to reuse that same memory. When the value in the memory changed, the value used by our dangling pointer also changed. We should not have ignored the compiler warning message (function returns address of local variable)!

Getting Help

gdb has extensive on-line help. To get help, just type "help". gdb will list categories of commands that you can get help on:

(gdb) help
List of classes of commands:
   
aliases -- Aliases of other commands
breakpoints -- Making program stop at certain points
data -- Examining data
files -- Specifying and examining files
internals -- Maintenance commands
obscure -- Obscure features
running -- Running the program
stack -- Examining the stack
status -- Status inquiries
support -- Support facilities
tracepoints -- Tracing of program execution without stopping the program
user-defined -- User-defined commands
   
Type "help" followed by a class name for a list of commands in that class.
Type "help" followed by command name for full documentation.
Command name abbreviations are allowed if unambiguous.

The most useful commands are in the categories: breakpoints, data, files, stack, and running. If you type "help running", you will see a list of 29 commands to control how your program runs, including the run, step, and next commands described above. The list includes a one line description of the command. If you see a command that looks useful for what you want to do, type "help <command>", substituting in the name of the command you are interested in, and you will get more detailed help on that command.

gdb Command Summary

l

List the program near the current line

l <line number>

List the program near the given line number of the current file

l <function name>

List the beginning few lines of the given function

p <expression>

Print the value of an expression

bt

Backtrace - prints the functions on the call stack identifying the current line in each function

up

Move to the current line in the calling function

down

Move to the current line in the called function

b <line number>

Set a breakpoint at the given line number of the current file

b <function name>

Set a breakpoint at the beginning of the given function

d <breakpoint number>

Deletes the breakpoint with the given number

r

Run the program from the beginning

n

Execute the next line without going into functions on function calls

s

Step to the next line, going into functions on function calls

c

Continue execution from the current breakpoint

help

On-line help

q

Quit gdb

Exercises

There are no specific exercises today. Walk through the gdb described above. If you want to become more comfortable, try running gdb with one of the programs you've written earlier this month or one of the sample programs online.


Return to CS 010 Home Page