How to monitor kill signal 2, 9 and 15 using systemtap script?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 9
  • systemtap

Issue

  • How to determine which process is sending signal 2 (SIGINT), 9 (SIGKILL) and 15 (SIGTERM) to other process using systemtap script?

Resolution

The solutions proposed below require installing systemtap package along with kernel-debuginfo packages.
You hence need to subscribe to debug repositories, as shown in the example below (for RHEL7):

# subscription-manager repos --enable rhel-7-server-debug-rpms

Once done, install the package and prepare the system:

# yum -y install systemtap
# stap-prep

Solution 1 - monitor interesting signals from userland

  1. Create the following systemtap script as sigcatch.stp to monitor signal 2, 9 and 15

    #!/usr/bin/env stap
    
    probe begin
    {
    	printf("Monitoring SIGTERM| SIGKILL| SIGINT signal: Start\n");
    }
    
    probe signal.send {
    	if (sig_name == "SIGTERM" || sig_name == "SIGKILL" || sig_name == "SIGINT") {
    		printf("%d %s was sent to %s(pid:%d) by %s(%d) uid:%d\n", 
    			gettimeofday_s(), sig_name, pid_name, sig_pid, execname(), pid(), uid())
    	}
    }
    
    probe end
    {
    	printf("Monitoring SIGTERM| SIGKILL| SIGINT signal: Stop\n");
    }
    

    Note: In order to monitor all signals, remove the if condition from the systemtap script in probe signal.send block.
    Note: In order to monitor signal specific to a process, use the pid_name == "process_name" in if condition. e.g.

    if ((pid_name == "httpd") && (sig_name == "SIGTERM" || sig_name == "SIGKILL" || sig_name == "SIGINT"))
    
  2. As root user, run the systemtap script:

    # stap -v sigcatch.stp 
    Monitoring SIGTERM| SIGKILL| SIGINT signal: Start
    
  3. Verify that killing a process shows output in systemtap script

    $ echo $$
    6005
    $ sleep 1000 &
    [1] 24791
    $ kill -9 24791
    [1]+  Killed                  sleep 1000
    

    The following is then seen:

    1675258980 SIGKILL was sent to sleep(pid:24791) by bash(6005) uid:1001
    

Solution 2 - monitor signals from the kernel

This solution doesn't rely on a userland stap command, which is itself subject to being killed.
This solution is very useful if some root process is misbehaving and using kill -9 -1 or similar.
In order to get protected, the solution uses printk (hence prints to /dev/kmsg), which has the drawback of being dangerous, hence requires guru-mode to be enabled.

Additionally, the proposed solution prints the full tree of the process sending the signal, which is very valuable information for short living processes.

  1. Create the following systemtap script as who_sends_kill_2_9_15_with_parents.stp to monitor signals 2, 9 and 15

        #!/usr/bin/stap
        #
        # Script printing PID + all parents sending kill -2/-9/-15 signals
        #
        # Author: Renaud Métrich <rmetrich@redhat.com>
    
        probe signal.send {
            if (sig_name == "SIGTERM" || sig_name == "SIGKILL" || sig_name == "SIGINT") {
                    printk(2 /* crit level */,
                            sprintf("PID %ld ('%s') sent %s to PID %ld ('%s')",
                                    pid(), execname(), sig_name, sig_pid, pid_name)
                    );
                    msg = sprintf("%ld ('%s')", pid(), execname());
                    ts = task_current()
                    while ((ts->pid != 1) && (ts->pid != 0)) {
                            ts = ts->parent;
                            msg .= sprintf(" -> %ld ('%s')", ts->pid, pid2execname(ts->pid));
                    }
                    printk(2, msg);
            }
        }
    
  2. Compile the script as a kernel module

    # stap -v -g -p 4 -m who_sends_kill_2_9_15_with_parents ./who_sends_kill_2_9_15_with_parents.stp
    
  3. Load the module

    # staprun -L ./who_sends_kill_2_9_15_with_parents.ko
    
  4. Verify that killing a process shows output in the journal

    # echo $$
    1654
    # sleep 1000 &
    [1] 3038
    # kill -15 3038
    [1]+  Terminated              sleep 1000
    

    Check the journal:

    # journalctl -p crit SYSLOG_IDENTIFIER=kernel | tail
    [...] kernel: PID 1654 ('bash') sent SIGTERM to PID 3038 ('sleep')
    [...] kernel: 1654 ('bash') -> 1650 ('sshd') -> 1108 ('sshd') -> 1 ('systemd')
    

    From above, we can see that bash running as PID 1654 sent a TERM signal to sleep running as PID 3038.
    Additionally, we can see that this bash was a sshd session.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.