OpenStack overcloud scale out gets stuck on Compute and NovaComputeDeployment due to issues with os-collect-config

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux OpenStack Platform 10.0 (RHOSP)
  • Red Hat Enterprise Linux OpenStack Platform 7.0 (RHOSP)

Issue

  • OpenStack overcloud scale out get stuck on Compute and NovaComputeDeployment due to issues with os-collect-config.

  • journalctl -u os-collect-config -f shows no updates in the logs.

Resolution

  • When os-collect-config does not show any updates in the logs for a few hours or even days, it is likely stuck:
[root@overcloud-compute-29 ~]# journalctl --since 2016-09-28 -u os-collect-config
-- Logs begin at Wed 2016-04-27 20:23:15 UTC, end at Wed 2016-09-28 19:48:11 UTC. --
[root@overcloud-compute-29 ~]# systemctl list-units | grep os-coll
  os-collect-config.service            
  • If heat cannot talk to os-collect-config, then this will lead to a timeout in specific resources during a scale out or stack update
[stack@undercloud ~]$ heat resource-list -n5 va | egrep -iv complete
+-----------------------------------------------+----------------------------------------+---------------------------------------------------+--------------------+----------------------+-----------------------------------------------+
| resource_name                                 | physical_resource_id                   | resource_type                                     | resource_status    | updated_time         | parent_resource                               |
+-----------------------------------------------+----------------------------------------+---------------------------------------------------+--------------------+----------------------+-----------------------------------------------+
(...)
| Compute                                       | <uuid1>  | OS::Heat::ResourceGroup                           | UPDATE_IN_PROGRESS | 2016-09-28T19:03:30Z |                                               |
(...)
| 29                                            | <uuid2>   | OS::TripleO::Compute                              | UPDATE_IN_PROGRESS | 2016-09-28T19:04:50Z | Compute                                       |
(...)
| NovaComputeDeployment                         | <uuid3>   | OS::TripleO::SoftwareDeployment                   | CREATE_IN_PROGRESS | 2016-09-28T19:05:36Z | 29                                            |
(...)
+-----------------------------------------------+----------------------------------------+---------------------------------------------------+--------------------+----------------------+-----------------------------------------------+
  • The solution is to restart os-collect-config on the nodes in question
systemctl restart os-collect-config
  • Verify os-collect-config logs on the node
journalctl -u os-collect-config -f
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.