How to analyze and interpret sar data.

Updated

  • Sar data is useful in pinpointing the system resource (networking, memory, IO, etc.) that is causing a performance issue. In addition, note that it contains much more data than is covered in this article; this is simply meant to highlight some of the areas that are useful in pinpointing system usage.

  • Sar data contains several useful sections for determining how the system was performing at a given time. By default, this is configured to run at ten minute intervals. Information on any of the descriptors used below, such as runq-sz, kbmemused, etc., can be obtained by searching for these names after executing man sar.

  • Load - information regarding how many tasks are currently on the system, along with how many are waiting to receive runtime, and an average of the running/runnable/D-state processes over periods of time.

00:00:01      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
[...]
04:20:01            1      2751      0.08      0.34      0.42
04:30:01            0       955      2.18      1.37      1.01
04:40:03           10      1645      8.18      6.58      3.83
04:50:14           25      1704    170.37    113.04     55.24
Average:            2      2605      6.67      4.64      2.51

From the above information we can see that the load had a massive spike around 4:50. Typically a system's load should remain at 70% of the number of cores or lower. If the system's load is consistently above this amount there may be performance degradation, and if the load ever rises above the number of cores there will be a significant slowdown.

  • Memory - information regarding memory and swap usage.
00:00:01    kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
[...]
04:30:01     62716272 201531148     76.27     15952   1556572   2048236        12      0.00         4
04:40:03       191904 264055516     99.93      2692     28908         0   2048248    100.00      8496
04:50:14       184100 264063320     99.93      1388     10600         0   2048248    100.00         0
Average:      4415719 259831701     98.33   1357749  20307185   1906978    141270      6.90       297

Linux likes to use memory at around 99%; however, if the system is actively swapping then the system is most likely experiencing memory pressure. Considering that we see all of the swap used above within 10 minutes the system containing this data was experiencing memory issues during this time.

  • IO - information pertaining to the number of disk accesses.
00:00:01          tps      rtps      wtps   bread/s   bwrtn/s
[...]
04:30:01        67.95     51.84     16.12   4303.82   4664.14
04:40:03       564.60    227.07    337.52  34338.84  87719.02
04:50:14        51.05     40.25     10.80   1326.32    245.06
Average:        31.12     11.00     20.12   1383.64   3346.65

The number of disk reads and writes will vary based on the underlying hardware; however, we can take a look at what is considered 'normal' for this system by examining the data over a period of time, and then look for spikes. We can see a large spike at 4:40 where the number of reads and writes increases dramatically. Note that shortly after these go back down, indicating that this massive burst was resolved.

  • CPU - information regarding where each of the system's cycles are spent.
00:00:01          CPU     %user     %nice   %system   %iowait    %steal     %idle
[...]
04:40:03          all     10.45      0.00      1.67      0.89      0.00     86.99
04:50:14          all      0.19      0.00     62.06      1.98      0.00     35.78

Spending time in %user is expected behavior, as this is where all non-system tasks are accounted for. If cycles are actively being spent in %system then much of the execution time is being spent in lower-level code. If %iowait is high then it indicates processes are actively waiting due to disk accesses being a bottleneck on the system.

  • In addition, sar data may be viewed graphically by downloading and using the 'kSar' tool. This tool is not provided from RedHat; however, it may be useful in pinpointing problematic times on the system.
Category
Components
Article Type