1. . Here are the RPM packages you will need install
a. kexec-tools --> Provided the kdump configs and services
b. kernel-kdump --> Kernel with a debuginfo (-g during compilation)
c. crash --> Utility to perform analysis on your kernel crash dump
d. kernel-debuginfo --> Provide the debugging symbols and debugging kernel file called vmlinux for crash utility
Note: In SLES10, you will have gdb-kdump script to help debug the kernel crash dump too.
2. . Modify the system kernel to reserve space for the crash kernel
Edit the /boot/grub/grub.conf file and add "crashkernel=128M@16M" to the kernel specification to reserve 128MB of
memory, starting at physical address 0x01000000 (16MB)
For example:
title Red Hat Enterprise Linux Server (2.6.18-8.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/VolGroup00
/LogVol00 rhgb quiet crashkernel=128M@16M
initrd /initrd-2.6.18-8.el5.img
Note: So far i got it working on Fedora Core 10, SLES10, and OEL5.
3. . Specify where the vmcore should be located
In /etc/kdump.conf, one can configure different kinds of dump target.
Here is an example, using NFS as dump target.
net my.server.com:/export/tmp:
This will mount the filesystem and copy vmcore to NFS server.
net my.server.com:/export/tmp
Note: Use NFS or SCP to avoid the kernel dump goes to failure with the disk. Just in case....
4. . Enable kdump service:
1. chkconfig kdump on
5. . Reboot system to put kdump configuration into effect.
Verify if kdump is active:
1. cat /proc/cmdline
ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=128M@16M
2. /etc/init.d/kdump status
Kdump is operational
3. /sbin/chkconfig --list |grep kdump
kdump 0:off 1:off 2:on 3:on 4:on 5:on 6:off
6. . Test kdump by crashing the system:
1. echo c > /proc/sysrq-trigger
This causes the kernel to panic, followed by the system restarting into the kdump kernel. When the boot process
gets to the point where it starts the kdump service, the vmcore should be copied out to disk to the location you
specified in the /etc/kdump.conf file.
NOTES: Verify the vmcore is the correct size, it should be about the size of memory on the crashed system.
1. . To do a crash dumpfile analysis
Copy the /usr/lib/debug/lib/modules/KERNEL_VER/vmlinux to your /PATH/TO/CRASH/DUMP/DATE/ path
Run the crash tool to debug it
1. crash vmlinux vmcore
Crash tool, behaves like a gdb. First time you want to do in the crash> console is run "log" command to see what's happening to the kernel
crash> log
For more info on how to use crash
crash> help
or
Goto http://people.redhat.com/anderson/crash_whitepaper