How to collect all required logs for Red Hat Support to investigate an OpenStack issue

Solution Verified - Updated

Environment

  • Red Hat OpenStack Platform

Issue

  • Reporting a problem to Red Hat Support without clear problem description, symptoms and logs from an OpenStack deployment can cause significant delays in investigating and finding out resolution especially for severity 1 issues. This kbase solution will help address this and enable the collection of all required logs.

Resolution

In addition to the information listed in Reference Guide for Engaging with Red Hat Support , when reporting a problem related to an OpenStack deployment, please make sure to check all of the following applicable sections:

Clear and Concise Problem Description

Make sure that the problem you are going to report is explained with all available details. This will involve:

  • Steps to reproduce the problem if it's reproducible on demand.
  • Time and date this problem was observed. If this was observed multiple times, please include time and date for all previous occurrences.
  • When did the problem start to occur? Did the problem start after making any kind of changes?
  • Exact command used to reproduce the problem, if the CLI is used, and the error message seen on the console. If possible, rerun the same command with --debug and provide full command output. If the problem is reproduced via GUI, please provide all relevant screenshots.
Details of the OpenStack architecture

Most of the openstack deployments should have different architecture depending upon the use case. It's important to know your architecture in detail to speed up troubleshooting your problem. This will involve:

  • Version of OpenStack used and how this environment was deployed? Was this deployed using OpenStack director, Packstack, or the previous OSP Installer (Foreman)? Is this a Highly Available Deployment? Was a third party tool used to deploy, please include more details about the background on why it was used.
  • Engagement journal, if any. If there are any documents that explains your environment in detail, please attach them when you report a problem.
  • More details about third party plugins and integration in the deployment? Eg, Is there a third party load balancer like F5 instead of default haproxy deployed by Red Hat deployment tools?
  • Integration with other third party plugins for openstack. Like SDN controllers, integration of cinder using vendor storage driver, etc.
Collect all required logs from the deployment

OpenStack has a distributed architecture and it may not be practical to collect logs from all nodes in the environment and is highly challenging to understand from which exact nodes to collect logs to investigate a specific problem. See details below to help you understand from where to collect sosreports and must-gather.

  • RHOSP 17 and earlier releases: sosreport from all controller nodes and director node All of the API services are running active/active and requests are load balanced by haproxy. This means api requests that led to the failure might have been distributed to all the controller nodes and sosreport from all of them will be required in most cases. In most scenarios we also need sosreport from Installer node which is called as Undercloud for OSP7 and above and RHEL-OSP-Installer for previous releases.

  • RHOSO: Starting from RHOSO 18, an OpenStack control plane is podified and there is no separate director node. A RHOSO must-gather should be collected to present perspective of OpenStack operators (responsible for provisioning and day-3 operations) and OpenStack control plane.

  • sosreport from affected compute nodes Eg:

    • If creation of an instance has failed, you can find out on which compute node it was scheduled from /var/log/nova/nova-scheduler.log. Sosreport from this specific compute node need to be collected.
    • If live migration of an instance has failed, we would require sosreports from the source compute node and the destination compute node. You can find out details of source and destination compute node from the output of nova show instance-name.
  • In addition, consider the usage of --verify and --all-logs flag to add deeper information into sosreport and all logs. An additional sosreport with only OpenStack related information can be added with sosreport --profile=system,openstack --verify --all-logs

  • crm_report from all controller nodes to investigate cluster/pacemaker related problems. In most cases, sosreport will take this automatically that covers details for last 7 days. If you are reporting a problem occurred 7 days before, you may need to collect crm_report manually. For more details, see how to generate a crm_report from a RHEL 6 or 7 High Availability cluster node using pacemaker?.

  • Starting with sosreport version 3.4 (RHEL 7.4), the openstack sosreport plugins are able to collect additional environment/tenant specific information when the OpenStack environment file got sourced before running sosreport. This will help the support representative to get a better understanding of the environment and less back and forth to collect additional data. In case:

    • a sosreport gets created on for the OpenStack Director (undercloud), source the stackrc file before running sosreport
    • a sosreport gets created for an overcloud system, source the overcloudrc or the environment file of a user from a specific tenant the issue gets reported for
Collect Debug logs

If this problem is reproducible on demand, we recommend Enabling Debug logging for all services involved in OSP 7 - 11 or Enabling Debug logging for all services involved in OSP 12 and later , reproduce and collect the above logs for investigation. If the problem is not reproducible on demand, it's advised that you enable Debug logging for all required services and keep the deployment running so that we can collect debug logs if this problem happens to occur next time. Please follow below guide lines to enable debug logs.

  • It's not required to enable debug logging for all services. Depending upon the nature of the problem, enable debug logs for services that can possibly contribute to the problem. See some examples below.
    • Creating an instance fails and nova logs show the failure is while trying to attach a volume to the instance. In this case you need to enable debug logging for Nova and Cinder as well as Horizon/Heat depending upon which interface is used to spawn the instance.
    • Creating an instance fails and nova logs show the failure is while trying to attach the instance to the network. In this case you need to enable debug logging for Nova and Neutron as well as Horizon/Heat depending upon which interface is used to spawn the instance.
  • Once you know which services need Debug logging to be enabled, edit the configuration file and set debug = True and restart the service.
  • If load balancer (Haproxy) is suspected to be contributing to the problem, enable haproxy logging to collect more details from haproxy.
  • Once debugging is enabled and problem is reproduced again, collect the same details explained in "Collect all required logs from your deployment" section.

Note: Don't forget to deactivate debug logging after the debug logs are collected to prevent the disks from filling up with logs.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.