How can I merge multiple PCP archives into one for a system level performance analysis covering multiple days or weeks
Quick Answer
To merge multiple PCP archives into one, use the pmlogextract tool, e.g.
# pmlogextract <archive1> <archive2> <newarchive>
EXAMPLE for the host named 'goblin' :
# cd /var/log/pcp/pmlogger/goblin/
# pmlogextract $(ls -1 *.[0-9] *.xz | sort -n) /tmp/goblin_perf
This merges all archives for the host 'goblin' into one and writes it to /tmp/goblin_perf (as the three files that a PCP archive consists of: *.0 *.meta and *.index). The resulting archive may be quite large, especially if it covers several days or longer - so you might want to compress it before copying it elsewhere, e.g.
# tar czf /tmp/goblin_perf.tgz /tmp/goblin_perf.{0,meta,index}
Details
By default, the PCP pmlogger service creates performance data archives in the /var/log/pcp/pmlogger/$(hostname)/ directory. Each archive consists of three files: the temporal index (archive.index), meta-data (archive.meta) and one or more data volumes (archive.0, archive.1 and often compressed, e.g. archive.0.xz suffix). For example, for the host "goblin" we have :
# cd /var/log/pcp/pmlogger/goblin/
# ls
20151222.0.xz 20151225.0.xz 20151228.0.xz 20151231.0.xz 20160103.0 20160106.00.10.0
20151222.index 20151225.index 20151228.index 20151231.index 20160103.index 20160106.00.10.index
20151222.meta 20151225.meta 20151228.meta 20151231.meta 20160103.meta 20160106.00.10.meta
20151223.0.xz 20151226.0.xz 20151229.0.xz 20160101.0.xz 20160104.0 Latest
20151223.index 20151226.index 20151229.index 20160101.index 20160104.index pmlogger.log
20151223.meta 20151226.meta 20151229.meta 20160101.meta 20160104.meta pmlogger.log.prior
20151224.0.xz 20151227.0.xz 20151230.0.xz 20160102.0 20160105.0
20151224.index 20151227.index 20151230.index 20160102.index 20160105.index
20151224.meta 20151227.meta 20151230.meta 20160102.meta 20160105.meta
The base name of each archive is the date the archive was created in YYYYMMDD.* format. The pmdumplog -l <archive> command can be used to examine the archive label for each archive - this reports the starting and ending timestamps for the archive, in ctime(3) format. For example :
# cd /var/log/pcp/pmlogger/goblin/
# for f in $(ls -1 *.0 *.xz | sort -n); do pmdumplog -l $f; done
Log Label (Log Format Version 2)
Performance metrics from host goblin
commencing Tue Dec 22 00:10:11.721 2015
ending Wed Dec 23 00:09:11.727 2015
Log Label (Log Format Version 2)
Performance metrics from host goblin
commencing Wed Dec 23 00:10:11.802 2015
ending Thu Dec 24 00:09:11.805 2015
... (omitted for brevity) ...
Log Label (Log Format Version 2)
Performance metrics from host goblin
commencing Tue Jan 5 00:10:14.218 2016
ending Wed Jan 6 00:09:14.228 2016
Log Label (Log Format Version 2)
Performance metrics from host goblin
commencing Wed Jan 6 00:10:12.190 2016
ending Wed Jan 6 14:54:12.202 2016
Notice the archive base names are designed to sort numerically, which is handy. So on host goblin, we have performance data from December 22nd 2015 thru to January 6th 2016. In most cases, the archives cover the 24-hour period starting at around 10 minutes past mid-night on each day - that's when the pmlogger_daily(1) cron job is run by default.
To merge all of these archives into one archive, we use the pmlogextract(1) command, e.g. :
# cd /var/log/pcp/pmlogger/goblin/
# pmlogextract $(ls -1 *.[0-9] *.xz | sort -n) /tmp/goblin_perf
Examining the label of the resulting merged archive in /tmp/goblin_perf.* :
# pmdumplog -l /tmp/goblin_perf
Log Label (Log Format Version 2)
Performance metrics from host goblin
commencing Tue Dec 22 00:10:11.721 2015
ending Wed Jan 6 15:00:12.196 2016
We can now use this archive for a system level performance analysis covering that time period, or any smaller interval thereof. For further details of the PCP time-window options (e.g. -S and -T etc.), see the TIME WINDOW SPECIFICATION section in the pcpintro(1) man page. These options are supported by all PCP monitoring tools. For example, to examine i/o statistics averaged over a 4 hour reporting interval for the 16 days covered by the above merged archive, a suitable command would be :
# pmiostat -a /tmp/goblin_perf -t 4h -xt
Automatic log merging - multi-archive support
Starting with pcp version 3.11.2-1 and later, the argument to the -a command line option for any PCP monitoring tool can be a directory path. In this case, all the archives in that directory need to be archives collected from the same host, and will be automatically merged for replay. This is particularly handy when examining customer data because typically all the archives below /var/log/pcp/pmlogger/
See Also
- Index of Performance Co-Pilot (PCP) articles, solutions, tutorials and white papers
- Content from pcp.io is not included.PCP Quick Reference Guide
- Introduction to storage performance analysis with PCP
- Content from www.pcp.io is not included.Performance Co-Pilot User's and Administrator's Guide