Using SystemTap

Updated

Introduction

SystemTap provides an infrastructure to simplify the process of gathering data on a running Linux system for detailed analysis. With SystemTap, users can instrument any aspect of a running kernel without having to re-compile, re-deploy, and re-install the kernel. SystemTap allows users to build custom scripts to safely extract, filter, and summarize information from a running kernel; these scripts can also be designed to be re-usable across multiple kernels. This makes it easy for administrators to diagnose complex performance or functional problems across multiple machines. While SystemTap is mostly useful for gathering data from kernel-space, the upstream SystemTap project is currently improving the capabilities SystemTap to probe events in user-space.

How SystemTap Works

SystemTap is built upon the notions of probe points and probe handlers.

Probe points are events that occur on the system, and can be a variety of things: it could be when a specific point in the code is reached, a session terminates, or a timer runs out. These events are implemented by a number of different technologies like kprobes, uprobes, tracepoints, and timers. Probe handlers are routines that are executed when a specified probed event "occurs"; typically, these handlers are designed to extract and process information on the state of the kernel or running code at the point of the probed event.

SystemTap provides a language for defining probe points and their corresponding probe handlers. Users specify these points and handlers in a SystemTap script, which SystemTap can compile into a module and load it on the running kernel. SystemTap then uses its runtime library for managing and processing module data to monitor the system for the defined events. As soon as those events occur, SystemTap executes their corresponding handlers.

When to Use (and Not to Use) SystemTap

Any information that users normally gather by inserting debugging code into the kernel can be collected using SystemTap scripts. SystemTap was designed to replace the tedious process of adding debugging code to the kernel followed by a kernel compile, install, and reboot. This makes SystemTap an ideal tool for building scripts to gather needed data about the system. As such, SystemTap should be used whenever there are no existing tools to collect needed data.

SystemTap was not designed, however, to replace any existing tools for collecting system data. As such, it is not advisable to use SystemTap when other diagnostic tools that provide the needed data are already installed (or easily installable) on the system. For example, valgrind provides several tools that profile memory usage, perform cache simulation, and execute call graph tracing, among other things. The oprofile package provides tools that can also profile most facets of the system in high detail.

Examples

SystemTap can be used by system administrators of nearly all experience levels. Writing a SystemTap script from scratch requires intermediate knowledge of SystemTap and how the Linux kernel works. To compile and run a SystemTap script, on the other hand, requires very little experience in either.

Generally, SystemTap has three distinct user case scenarios:

  1. Compiling a SystemTap script into a module and running it

  2. Creating a pre-compiled module from a SystemTap script (to be run locally or on another system)

  3. Running a pre-compiled SystemTap module

Users with root access to the system they wish to probe usually fall into the first user case. This is also true for users starting to learn about SystemTap by running and compiling scripts on a test machine. In a secure enterprise environment, however, users normally do not have the privileges required to run SystemTap. In such cases, an administrator normally compiles the SystemTap scripts into modules on a separate "host" machine. The administrator then passes that pre-compiled module to a user, then grants that user the privileges required to run SystemTap on a specific "target" machine. This allows an administrator to delegate the task of running a SystemTap probe to each user without compromising security.

Each user case has different set-up requirements. The first two user cases involve compiling a SystemTap script into a kernel module. To compile a script into a kernel module, you need to install the following packages:

  1. systemtap
  2. systemtap-runtime
  3. kernel (i.e. the kernel to be probed)
  4. kernel information packages (i.e. -devel, -debuginfo, and -debuginfo-common)

Running a pre-compiled module only requires the systemtap-runtime package.

Configuring user privileges

Compiling a SystemTap script and running a pre-compiled module requires elevated privileges. To perform either task, you need to be logged in as root, or have sudo set up to allow you to run stap and staprun.

Alternatively, if root or sudo privileges are not viable options, you can also add a username to the stapdev or stapusr groups. Members of the stapdev group are allowed to compile SystemTap scripts and run pre-compiled modules, while members of the staprun group can only run pre-compiled modules.

Installing kernel information packages

To compile a script into a module, SystemTap requires the exact kernel to be probed along with the kernel information packages: kernel-devel, kernel-debuginfo, and kernel-debuginfo-common. These packages must have the same version (and variant) of the kernel upon which the modules will be loaded (i.e. the kernel to be probed).

The easiest way to install the required kernel information packages is to use the following script:

stapprep.sh
#! /bin/bash
check_error() { if test $1 != 0; then echo $2; exit $1; fi }

if [ "$#" -lt 1 ]; then
UNAME=`uname -r` # determine the kernel running on the machine
else
UNAME=$1 #user passed in uname value
fi
UNAME=`echo $UNAME | sed "s/ //"` #strip out any whitespace
KERNEL="kernel"
for VARIANT in debug kdump PAE xen; do
TMP=`echo $UNAME | sed s/$VARIANT//`
if [ "$TMP" != "$UNAME" ]; then
UNAME=$TMP; KERNEL="kernel-$VARIANT"
fi
done
KERN_ARCH=`uname -m`
KERN_REV=`echo $UNAME | sed s/.$KERN_ARCH//` # strip arch from uname
CANDIDATES="$KERNEL-$KERN_REV.$KERN_ARCH \
$KERNEL-devel-$KERN_REV.$KERN_ARCH \
$KERNEL-debuginfo-$KERN_REV.$KERN_ARCH \
kernel-debuginfo-common-$KERN_REV.$KERN_ARCH"
NEEDED=`rpm --qf "%{name}-%{version}-%{release}.%{arch}\n" \
-q $CANDIDATES | grep "is not installed" | awk '{print $2}'`
if [ "$NEEDED" != "" ]; then
echo -e "Need to install the following packages:\n$NEEDED"
if [ `id -u` = "0" ]; then #attempt download and install
DIR=`mktemp -d` || exit 1
yumdownloader --enablerepo="*debuginfo*" $NEEDED --destdir=$DIR
check_error $? "problem downloading rpm(s) $NEEDED"
rpm --force -ivh $DIR/*.rpm
check_error $? "problem installing rpm(s) $NEEDED"
rm -r $DIR #cleanup
fi
fi

To install the kernel-debuginfo, kernel-devel, and kernel-debuginfo-common packages for a specific kernel, run stapprep.sh with a specified version and variant as an argument; for example, stapprep.sh 2.6.18-92.el5. To determine the needed version and variant string for a running kernel, use the command uname -r. Note that running stapprep.sh with no arguments will install the needed packages for the currently running kernel.

stapprep.sh uses yumdownloader to download and install packages; yumdownloader is provided by the yum-utils package.To install yum-utils, use:

sudo yum install -y yum-utils

Compiling and running a SystemTap script

Required packages: systemtap, systemtap-runtime, kernel to be probed*,* matching kernel information packages

To compile a SystemTap script into a kernel module, use the stap application. By default, the stap application compiles a SystemTap script into a kernel module and automatically loads the kernel module onto the running kernel; it will then execute the defined handlers for each probe point when appropriate. To compile and load a SystemTap script, use the following command:

stap flag filename.stp

The following is a list of the most commonly-used stap flags:

Flag Definition
-v Makes the output of the SystemTap session more verbose.  May be repeated for even more verbose output.  This option is particularly useful if you encounter any errors in running the script.
-h Show help message.
-e script

Use script rather than a file as input for SystemTap translator.

-r version

Use version as the target kernel version instead of the currently running kernel version.  Do not load and run the compiled module. Note that the corresponding kernel and kernel information packages for the specified version must be installed. This is the version as given by uname -r, not the RPM version as it misses the release number.

-m module

Use module as the name for the generated kernel module, and place the generated module in the current directory.

-p 4 Do not load and run the compiled module.  This flag is used with the -m module flag to simply create a pre-compiled module.
-c command

Start the probes, run command, and exit when command finishes.

For example, to compile and load the *hello.stp* script verbosely, run **stap -v hello.stp**. #### Use Case 2: Creating a pre-compiled module

Required packages: systemtap, systemtap-runtime, kernel to be probed*,* matching kernel information packages

You can also compile SystemTap modules (i.e. from SystemTap scripts) on a host system and run those modules on a "target" system. To do this, however, both host and target system must have the same architecture (i686, x86_64, ia64, and so on). They must also be running the same major version of Red Hat Enterprise Linux (i.e. RHEL 4 or 5), although they can both be of different minor versions (i.e. RHEL5.1 and RHEL5.2).

To create a pre-compiled module on a host system, run:

stap -r version filename.stp -m module -p 4

You can also replace version with $(uname -r) to refer to the host system's current running kernel. For example, to build the pre-compiled module hellomodule.ko from hello.stp, run stap -r $(uname -r) hello.stp -m hellomodule -p 4.

Use Case 3: Running a pre-compiled module

Required packages: systemtap-runtime

To run a pre-compiled SystemTap module, simply copy it over to the target system and run:

staprun modulename

For example, to run the pre-compiled hellomodule.ko module, run staprun hellomodule.ko. The staprun application is provided by the systemtap-runtime package. As mentioned earlier, you do not need to install the kernel information packages (i.e. -devel, -debuginfo, and -debuginfo-common kernel packages) or the systemtap package to load and run a pre-compiled module.

Example

These example scripts are taken from the script library at Content from sourceware.org is not included.Content from sourceware.org is not included.http://sourceware.org/systemtap/examples/ These scripts are also available in the systemtap-testsuite RPM. After installing systemtap-testsuite, you can find several SystemTap examples under /usr/share/systemtap/testsuite/systemtap.examples.

file-opens.stp:

#! /usr/bin/env stap

probe syscall.open
{
printf ("%s(%d) open (%s)\n", execname(), pid(), argstr)
}

probe timer.ms(4000)
{
exit()
}

This script has two probe points: first probe point (i.e. probe syscall.open) has a handler that prints out the name and ID of each process that executes the system call open. The second probe point (i.e. probe timer.ms(4000)) executes exit( ) after 4 seconds. The exit( ) function simply terminates the script/module.

iotop.stp:

#! /usr/bin/env stap

global reads, writes, total_io

probe vfs.read {
reads[execname()] += $count
}

probe vfs.write {
writes[execname()] += $count
}

# print top 10 IO processes every 5 seconds
probe timer.s(5) {
foreach (name in writes)
total_io[name] += writes[name]
foreach (name in reads)
total_io[name] += reads[name]
printf ("%16s\t%10s\t%10s\n", "Process", "KB Read", "KB Written")
foreach (name in total_io- limit 10)
printf("%16s\t%10d\t%10d\n", name,
reads[name]/1024, writes[name]/1024)
delete reads
delete writes
delete total_io
print("\n")
}

This script monitors attempted reads or writes to the virtual file system. The probe vfs.read and probe vfs.write probe points tally the amount of data each process attempts to read from or write to the virtual file system. The probe timer.s(5) probe point contains a handler that prints out the names of the ten processes that performed the most I/O, along with the amount of I/O each process read or wrote (in KB). The iotop.stp script prints out this information every 5 seconds, and will need to be manually exited via Ctrl+C.

Additional Resources

Additional documentation

the official RHEL-focused documentation for SystemTap is located at This content is not included.This content is not included.http://www.redhat.com/docs/manuals/enterprise/ . The following SystemTap documents are available at the link:

  • SystemTap Beginner's Guide - perfect for users with little to no experience in SystemTap. It explains how SystemTap works, the basics of writing SystemTap scripts, and how to install/set up SystemTap. It also contains a short collection of sample scripts, including detailed explanations of each example.
  • SystemTap Language Reference - a reference for the language used in SystemTap scripts. It is suitable for users with intermediate experience in SystemTap, particularly those with knowledge of the C language.
  • SystemTap Tapset Reference - a reference that documents most of SystemTap's available tapsets.

The man pages for stap and staprun also provide a lot of information on how to compile and run SystemTap scripts, along with details on the SystemTap language. In addition, man stapprobes documents several of SystemTap's most useful probe points; its SEE ALSO section lists man pages that contain information on useful probe points for specific subsystems (e.g. stapprobes.iosched for probe points suitable for the I/O scheduler).

Real world examples

The upstream SystemTap project Content from sourceware.org is not included.Content from sourceware.org is not included.http://sourceware.org/systemtap/ also has a wealth of information about SystemTap.  While not focused on RHEL, it contains lots of good documentation (including tutorials), pointers to public mailing lists and IRC channels, and access to the upstream source code.

Category
Components
Tags
Article Type