How does Performance Co-Pilot (PCP) compare with sysstat

Updated

Installation and Dependencies

Platform availability and familiarity

  • sysstat is available on all RHEL platforms, as well as a history on other Linux and Unix platforms. There is no management service and relies on cron for scheduling repeated jobs to collect data, which limits the size of the collection intervals to 1 minute unless more complex scripts are created to collect at smaller intervals. Data collected as sar files are binary dependent, requiring matching versions when analyzing from a different system.
  • PCP, originally developed by engineers at Silicon Graphics (SGI) was included in RHEL6.6 and is now the preferred performance metrics analysis tooling for Red Hat. The PCP service manages metrics for local as well as centralized collection to a remote system. The service can collects metrics by default at 1, 30, or 60 second intervals, depending on how "expensive" a given metric is. This is all configurable.
    PCP is easy to use for basic data capture and analysis can be easily done on any platform with no binary limitations. For users familiar with common systat tools such as iostat, vmstat and so forth, PCP provides sub-commands providing a similar and compatible experience:
pcp iostat
pcp vmstat
pcp dstat

Configurability

  • sysstat is not installed or enabled by default in the RHEL minimal package set, but is easy to install and enable. It requires no configuration unless you want to edit its crontab entries.
  • pcp is not installed or enabled by default in the RHEL minimal package set, but is easy to install and enable. PCP has vast configuration options, mostly with sensible defaults. The most simple use case involving data capture and replay requires no configuration other than to enable and start the required services, see the KCS solution.

Ease of tool use

  • sysstat is fairly easy to use, though the sar reports are cumbersome to read and only sar can read the archives, not other tools such as iostat, vmstat and so forth - these offer live monitoring only. This is actually a really nasty aspect of sysstat - the default sadc data collection is often insufficient for complex analysis, especially for storage performance issues. Support engineers have to send the customer additional scripts and instructions (e.g watcher-cron.sh, etc) to get them to reproduce the issue and capture the required performance data - this not only adds significant time to the case resolution, but frustrates customers and often fails to help resolve the case at all - the performance issue may be intermittent, or the customer may be very reluctant to attempt to reproduce the issue if they are in a production environment.
  • PCP has numerous common command line options, consistently available in all tools, see pcpintro(1). All PCP tools can replay any PCP archive from any platform with forward and backward compatibility. The default data collected is usually sufficient for complex analysis tasks when needed. Support cases can therefore be resolved more quickly because the data is already captured and available for analysis when an unexpected issue occurs.

Metric naming

  • sysstat does not use a formal metric namespace, other than column titles in the tool reporting.
  • PCP has a uniform hierarchic name space for identifying metrics, the Performance Metrics Name Space (PMNS). The PMNS is distributed, configured and served at the source of the performance metrics (host or archive). PCP client tools use documented APIs to query the PMNS. There is also help text and other metadata associated with every metric.

Metric meta-data

  • sysstat uses the same metric semantics as the source of the data, i.e. mostly mirroring the kernel /proc exported data semantics.
  • in addition to uniform naming with the PCP namespace, PCP metrics have rich meta data - data type (int, long, unsigned, string, blob, etc), semantics (counter, instant, discrete), units (space, time, count), help text, etc.

Metric coverage and extensibility

  • sysstat only supports the metrics that are implicitly required by the performance tools it ships
  • PCP supports pretty much everything exported by the kernel and can be easily extended via plugins, which are called Performance Metrics Domain Agents (PMDAs). PCP organises and names its metrics with a namespace, irrespective of whether a particular tool wants to use the metrics or not. This, along with PCP metric meta-data (see above) allows generic performance tools such as pmchart(1), pmval(1), pmrep(1) and others to monitor any metric, both existing and any new metrics that may be added in the future by addition of new PMDAs.

Separation of capture from playback

  • sysstat supports data capture with the sadc tool and its associated crontab entries. The sar tool reads this metadata to produce textual reports. These reports are generated daily and kept for one week by default and automatically included in sosreports. Only the sar tool can read the binary sadc files. Other tools such as vmstat, iostat and so forth can only do live monitoring and cannot replay sadc binary data. This is why we have scripts such as watcher-cron.sh and so forth - to capture other data that sadc either doesn't capture or sar can't interpret.
  • PCP has complete separation of data capture from client replay, and inherently supports both live and archive replay with all tools.

Temporal flexibility

  • sysstat tools can only vary the live sampling rate. The sar tool can read sadc binary files but not change the underlying sampling rate. In addition, it's a frustrating task trying to temporally correlate reports from various systat tools - inconsistent time stamping formats are used and often the timestamps are difficult to parse. This has resulted in a proliferation of ad-hoc post-processing scripts.
  • PCP tools can vary the live sampling rate and also replay archive data at intervals that may differ to the original sampling rate. Metric values are interpolated between (or across) physical samples to allow the tool to use a sampling rate that suits the analysis, e.g. over a few seconds, or averaged over a whole day, etc. Metrics with 'discrete' semantics (e.g. hinv.ncpu) or string data type are not interpolated. PCP has many tools that are ideally suited for use by post-processing scripts (especially with the perl and python bindings to the PCP APIs). Timestamping is consistently used throughout, and the PCP archive format employs a temporal index to make replay more efficient. See the TIME WINDOW SPECIFICATION section in PCPINTRO(1).

Data archive portability

  • sar can generally only replay sadc binary files that were captured on the same version of sysstat and on the same platform and arch. This is quite painful in support scenarios - we often have to fire up a VM that matches the customer environment in order to examine sadc binary data. There are tools to convert sadc files into XML, which is portable (sar itself can't read it, but fortunately PCP can with the sar2pcp(1) tool). This is why tools such as sosreport generate the sar reports on the system, rather than just including the binary sadc files in the sosreport tarball. There is no forward or backward compatibility and the binary data format is changed constantly - see the sysstat CHANGELOG for examples.
  • PCP archives are completely portable to/from any other version of PCP, and can be replayed by PCP tools on any platform/arch. There are also a range of tools (and an API) for importing data from other monitoring packages, e.g. collectl2pcp(1), sar2pcp(1), iostat2pcp(1) and many others. The PCP archive format aims to be canonical, universal and portable.

Documented APIs

  • sysstat source code has an internal common.c but no proper APIs, libraries or documentation.
  • PCP has documented stable APIs at all appropriate levels - between pmcd and its PMDA plugins (libpcp_pmda), between pmcd and its clients (libpcp) and numerous other libraries. These libraries support multiple language bindings, e.g. c/c++, perl, python, etc. In addition there are higher level class libraries that make it very easy to develop new performance tools, e.g. see the python binding for the pmfetchgroup(3) API. PCP also has APIs to assist customers in instrumenting their own applications, which is particularly useful when correlating application activity with system level performance (see pmtrace(1), pmdatrace(1), pmdatrace(3), mmv(5) and associated documentation).

References

Category
Components
Article Type