What information should be collected while debugging/opening a Red Hat Ceph Enterprise case?
Introduction
In order for the Red Hat support team to more efficiently handle your request, we collected a set of best practices to open a new ticket.
This article explains the different information required depending on issues you may encounter.
General
-
Red Hat Enterprise Linux 7, CentOS 7, Ubuntu 12.04, and Ubuntu 14.04 have packaged the "sosreport" tool with support for Ceph. This tool will automatically generate a report on both your system and your Ceph configuration. Please consider attaching a sosreport to your case whenever a technical issue arise with your cluster.
-
No matter where the issue may arise, the sosreport command should always be run from one of the Ceph monitors. Additionally, if an OSD or Ceph Object Gateway node is having an issue, then it should also be run on this node.
-
To install sosreport on RHEL/CentOS, run:
# yum install sos
The ceph-common package would also need to be installed on containerized deployments. Otherwise sosreport couldn't execute ceph commands and collect their output.
- To install sosreport on Ubuntu, run:
$ sudo apt-get install sosreport
- To run the sosreport:
# sosreport
- The troubleshooting guide lists log levels (if increased debug levels are required for investigation) and also shows how to collect a core-dump in case any daemons would e.g. run into segfaults.
Monitor issues
- If the problem is with a Ceph monitor, make sure to attach the monitor logs to the case. When logging to file is enabled they can be collected directly (on later versions the path contains a directory with the cluster fsid):
/var/log/ceph/ceph-mon*
When logging to the journal (default for containerized deployments) the journalctl tool can be used to display the logs for a specific service id like in this example:
journalctl -u ceph-b4636026-912a-11ef-ae22-509a4c8ff69d@mon.hostname.service
- If the monitor is running, but is not joining the quorum, attach the output of:
For RHCS1.3:
# ceph admin mon.{name} mon_status
For RHCS2.0 and later:
# ceph daemon mon.{name} mon_status
Alternatively the direct admin socket path can be specified:
# ceph daemon /var/run/ceph/b4636026-912a-11ef-ae22-509a4c8ff69d/ceph-mon.hostname.asok mon_status
NOTE: The name of the monitor is typically its hostname. The above command should be run from the same machine the monitor is running on.
OSD issues
- If the issue is with the OSD, attach the OSDs logs to the case. These files are present on the nodes where the OSDs are running:
/var/log/ceph/ceph-osd.{X}*
When not logging to files the journalctl command can be used as shown in the MON section.
- To understand the nodes where the OSDs are running, use:
# ceph osd tree
- Additionally, when unable to provide an sosreport, attaching the OSDmap and CRUSHmap to the case may help with issues regarding recovery and replication. To get this data, run:
OSDmap
# ceph osd dump > /tmp/osdmap.txt
CRUSHmap
# ceph osd getcrushmap -o /tmp/crushmap && crushtool -d /tmp/crushmap -o /tmp/crushmap.txt
The crushtool is provided in the ceph-base package.
PG issues
- To troubleshoot Placement Group (PG) issues, attach the following logs from one of the monitors to the case:
/var/log/ceph/ceph.log*
- Along with the logs above, please provide a PG dump from the cluster using the following command:
# ceph pg dump | gzip > ceph_pg_dump.$(date +%F_%H-%M-%S).txt.gz