How can I convert a collectl archive into a Performance Co-Pilot (PCP) archive?

Solution Verified - Updated

Environment

  • any platform supported by collectl and PCP (RHEL, Fedora, Debian, etc)

Issue

  • I have captured a performance data archive using the collectl tool. How can I convert my collectl archive into a PCP archive so it can be analysed using PCP monitoring tools?

Resolution

The following demonstrates converting a collectl performance data archive into a PCP archive, and then covers some very basic i/o analysis tasks using the PCP monitoring tools on the resulting archive. Note that the PCP archive used in these examples could also have been collected using PCP directly. For instructions on capturing PCP archives directly, see How do I install Performance Co-Pilot (PCP) on my RHEL server to capture performance logs.

Installation

Note The package, pcp-import-collectl2pcp is provided via the rhel-7-*-optional-rpms repository and will need to be enabled to install the package if downloading from the Red Hat CDN.

  • install the collectl2pcp tool, which ships in the pcp-import-collectl2pcp package in RHEL6.6 and later (and other distros), e.g. yum install pcp-import-collectl2pcp --enablerepo=rhel-7-server-optional-rpms
  • you can use collectl2pcp on any platform supported by PCP, even different architectures. This does not need to be run on the same host or platform that the collectl archive was captured on.

Archive Conversion example

  • Starting with a collectl archive named somehost-20160208-000000.raw.gz
# collectl2pcp somehost-20160208-000000.raw.gz somearchive

The last command line argument is the name of the PCP output archive, in this case we've called it 'somearchive', which consists of 3 files (*.0, *.index and *.meta). If you have more than one collectl archive for the same host, you can convert them all at the same time. The input collectl archive(s) can be specified in any order - the tool will sort them chronologically prior to processing.

  • You can check the time bounds and timezone of the resulting PCP archive using the pmdumplog command, e.g.
# pmdumplog -L somearchive
Log Label (Log Format Version 2)
Performance metrics from host somehost
  commencing Mon Feb  8 11:00:00.001 2016
  ending     Tue Feb  9 00:58:50.001 2016
Archive timezone: +1100

The output archive spans roughly 13 hours from 11am on Feb 8th thru to just before 1am on Feb 9th.

  • we can summarize iostat statistics at four hourly sampling interval using the pmiostat command (from the pcp-system-tools package), e.g. :
# pmiostat -a somearchive -t 4h -xt -P0
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Mon Feb  8 15:00:00 2016 sda               0     10     0    8      0     71      8.8      0.0     5       4       5     0
Mon Feb  8 15:00:00 2016 sdb               0      0     0    0      0      0      0.0      0.0     0       0       0     0
Mon Feb  8 15:00:00 2016 sdc               0     48     0   47      0    383      8.1      0.1     2       0       2     6
Mon Feb  8 19:00:00 2016 sda               0     10     0    8      0     72      8.9      0.0     4       6       4     0
Mon Feb  8 19:00:00 2016 sdb               0      0     0    0      0      0     26.9      0.0    14       0      14     0
Mon Feb  8 19:00:00 2016 sdc               0     97     0  118     12    861      7.4      0.2     1       4       1    12
Mon Feb  8 23:00:00 2016 sda               0     16     0    8      0     98     12.2      0.1    10       8      10     0
Mon Feb  8 23:00:00 2016 sdb               0      0     0    0      0      0      0.0      0.0     0       0       0     0
Mon Feb  8 23:00:00 2016 sdc               0    411     0  549      1   3839      7.0      0.9     2      11       2    54

Clearly there are three scsi disks on this system, and 'sdc' is the most busy. Note the -P0 flag was used to specify zero digits of precision, for brevity.

  • to restrict the report to just the sdc device, we can use the -R option, e.g.
# pmiostat -a somearchive -t 4h -xt -P0 -R'sdc$'
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Mon Feb  8 15:00:00 2016 sdc               0     48     0   47      0    383      8.1      0.1     2       0       2     6
Mon Feb  8 19:00:00 2016 sdc               0     97     0  118     12    861      7.4      0.2     1       4       1    12
Mon Feb  8 23:00:00 2016 sdc               0    411     0  549      1   3839      7.0      0.9     2      11       2    54
  • to sum the results for sdb and sdc, use the -G option, e.g.
# pmiostat -a somearchive -t 4h -xt -P0 -R'sd[bc]$' -Gsum
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Mon Feb  8 15:00:00 2016 sum(sd[bc]$)      0     48     0   47      0    383      8.1      0.1     2       0       2     6
Mon Feb  8 19:00:00 2016 sum(sd[bc]$)      0     97     0  118     12    861     34.3      0.2    15       4      15    12
Mon Feb  8 23:00:00 2016 sum(sd[bc]$)      0    411     0  549      1   3839      7.0      0.9     2      11       2    54

this is useful, e.g. if sdb and sdc are using the same scsi host, bus or target, or are paths to the same device.

  • to examine device-mapper statistics instead (e.g. dm-multipath, or lvm devices), use the -x dm flag, e.g.
# pmiostat -a somearchive -t 4h -xt -P0 -R 'dm-[0-2]$' -xdm
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Mon Feb  8 15:00:00 2016 dm-0              0      0     0    0      0      1      4.0      0.0    11      35      11     0
Mon Feb  8 15:00:00 2016 dm-1              0      0     0    0      0      0      0.0      0.0     0       0       0     0
Mon Feb  8 15:00:00 2016 dm-2              0      0     0    8      0     33      4.0      0.1     8       0       8     0
Mon Feb  8 19:00:00 2016 dm-0              0      0     0    1      0      2      4.0      0.0    17       4      17     0
Mon Feb  8 19:00:00 2016 dm-1              0      0     0    0      0      0      0.0      0.0     0       0       0     0
Mon Feb  8 19:00:00 2016 dm-2              0      0     0   18      1     74      4.1      0.1     7       1       7     0
Mon Feb  8 23:00:00 2016 dm-0              0      0     0    0      0      0      4.9      0.0     2       6       2     0
Mon Feb  8 23:00:00 2016 dm-1              0      0     0    0      0      0      0.0      0.0     0       0       0     0
Mon Feb  8 23:00:00 2016 dm-2              0      0     0   72      0    289      4.0      1.1    15       3      15     1

Notice above we restricted the output to DM devices matching 'dm-[0-2]$'. Note also that in archives converted with collectl2pcp, the DM device names are the non-persistent names. This is because collectl does not do the mapping between DM name to the logical name (sysstat does not do this mapping either, but PCP does). In archives collected with PCP, the persistent logical names are reported (from /dev/mapper/*), e.g. live on my laptop:

# pmiostat -xdm -R'root|home' -P0
# Device     rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
fedora-home       0      0     0    0      0      0      0.0      0.0     0       0       0     0
fedora-root       0      0     0    0      0      0      0.0      0.0     0       0       0     0
  • for further details, see the man pages for collectl2pcp(1) and pmiostat(1), and other tools. Also the PCPIntro(1) man page has a useful summary of the common PCP command line arguments available for use with most PCP monitoring tools.

Converted metrics list

The collectl2pcp version that ships with pcp-3.11.3-1 and later, converts the metrics listed below. This is only a subset of all metrics collected by collectl (and is also a subset of those collected by PCP by default). These metrics are sufficient for most purposes, i.e. basic CPU, Disk, Network, Memory and per-process analysis tasks.

# pminfo -a somearchive
proc.io.cancelled_write_bytes
proc.io.write_bytes
proc.io.read_bytes
proc.io.syscw
proc.io.syscr
proc.memory.vmswap
proc.memory.vmlib
proc.memory.vmexe
proc.memory.vmstack
proc.memory.vmdata
proc.memory.vmrss
proc.memory.vmlock
proc.memory.vmsize
proc.psinfo.psargs
proc.psinfo.cmd
proc.psinfo.processor
proc.psinfo.wchan_s
proc.psinfo.blocked
proc.psinfo.rss
proc.psinfo.vsize
proc.psinfo.start_time
proc.psinfo.nice
proc.psinfo.priority
proc.psinfo.cstime
proc.psinfo.cutime
proc.psinfo.stime
proc.psinfo.utime
proc.psinfo.ppid
proc.psinfo.sname
proc.psinfo.pid
mem.util.directMap2M
mem.util.directMap4k
mem.util.hugepagesSurp
mem.util.hugepagesRsvd
mem.util.hugepagesFree
mem.util.hugepagesTotal
mem.util.anonhugepages
mem.util.corrupthardware
mem.util.vmallocChunk
mem.util.vmallocUsed
mem.util.vmallocTotal
mem.util.committed_AS
mem.util.commitLimit
mem.util.bounce
mem.util.NFS_Unstable
mem.util.pageTables
mem.util.kernelStack
mem.util.slabUnreclaimable
mem.util.slabReclaimable
mem.util.slab
mem.util.shmem
mem.util.mapped
mem.util.anonpages
mem.util.writeback
mem.util.dirty
mem.util.swapFree
mem.util.swapTotal
mem.util.mlocked
mem.util.unevictable
mem.util.inactive_file
mem.util.active_file
mem.util.inactive_anon
mem.util.active_anon
mem.util.inactive
mem.util.active
mem.util.swapCached
mem.util.cached
mem.util.bufmem
mem.util.free
mem.physmem
network.interface.collisions
network.interface.out.compressed
network.interface.out.carrier
network.interface.out.fifo
network.interface.out.drops
network.interface.out.errors
network.interface.out.packets
network.interface.out.bytes
network.interface.in.mcasts
network.interface.in.compressed
network.interface.in.frame
network.interface.in.fifo
network.interface.in.drops
network.interface.in.errors
network.interface.in.packets
network.interface.in.bytes
network.udp.sndbuferrors
network.udp.recvbuferrors
network.udp.outdatagrams
network.udp.inerrors
network.udp.noports
network.udp.indatagrams
network.tcp.outrsts
network.tcp.inerrs
network.tcp.retranssegs
network.tcp.outsegs
network.tcp.insegs
network.tcp.currestab
network.tcp.estabresets
network.tcp.attemptfails
network.tcp.passiveopens
network.tcp.activeopens
network.tcp.maxconn
network.tcp.rtomax
network.tcp.rtomin
network.tcp.rtoalgorithm
disk.dm.total_bytes
disk.dm.read_bytes
disk.dm.write_bytes
disk.dm.aveq
disk.dm.avactive
disk.dm.write_rawactive
disk.dm.blkwrite
disk.dm.write_merge
disk.dm.write
disk.dm.read_rawactive
disk.dm.blkread
disk.dm.read_merge
disk.dm.read
disk.dm.await
disk.dm.r_await
disk.dm.w_await
disk.dm.avg_qlen
disk.dm.avg_rqsz
disk.dm.util
disk.dev.total_bytes
disk.dev.read_bytes
disk.dev.write_bytes
disk.dev.aveq
disk.dev.avactive
disk.dev.write_rawactive
disk.dev.blkwrite
disk.dev.write_merge
disk.dev.write
disk.dev.read_rawactive
disk.dev.blkread
disk.dev.read_merge
disk.dev.read
disk.dev.await
disk.dev.r_await
disk.dev.w_await
disk.dev.avg_qlen
disk.dev.avg_rqsz
disk.dev.util
hinv.ninterface
hinv.ndisk
hinv.machine
hinv.physmem
hinv.pagesize
hinv.ncpu
kernel.percpu.cpu.intr
kernel.percpu.cpu.guest
kernel.percpu.cpu.steal
kernel.percpu.cpu.irq.soft
kernel.percpu.cpu.irq.hard
kernel.percpu.cpu.wait.total
kernel.percpu.cpu.idle
kernel.percpu.cpu.sys
kernel.percpu.cpu.nice
kernel.percpu.cpu.user
kernel.all.load
kernel.all.nprocs
kernel.all.pswitch
kernel.all.intr
kernel.all.cpu.intr
kernel.all.cpu.guest
kernel.all.cpu.steal
kernel.all.cpu.irq.soft
kernel.all.cpu.irq.hard
kernel.all.cpu.wait.total
kernel.all.cpu.idle
kernel.all.cpu.sys
kernel.all.cpu.nice
kernel.all.cpu.user
kernel.all.hz
kernel.uname.release
kernel.uname.machine
kernel.uname.distro
kernel.uname.sysname
kernel.uname.nodename
event.flags
event.missed

See Also

Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.