HAProxy does not restart during RHOSP upgrade when RHCS is director-deployed and Alertmanager is enabled

Solution Verified - Updated

Environment

  • Red Hat Openstack Platform (RHOSP) 17.1
  • Red Hat Ceph Storage (RHCS) 5
  • Red Hat Ceph Storage (RHCS) 6
  • Red Hat Ceph Storage (RHCS) 7
  • Red Hat Ceph Storage (RHCS) 8

Issue

  • If you deployed Alertmanager in a director-deployed Red Hat Ceph Storage environment, the upgrade from Red Hat Ceph Storage version 4 to version 5 fails. The failure occurs because HAProxy does not restart after you run the following command to configure cephadm on the Red Hat Ceph Storage nodes:

    $ openstack overcloud external-upgrade run \
    --skip-tags ceph_ansible_remote_tmp \
    --stack <stack> \
    --tags cephadm_adopt  2>&1
    

    After you run the command, the Red Hat Ceph Storage cluster status is HEALTH_WARN.

Resolution

Apply the following workaround to address the issue.

  1. Log in to a Controller node and create the following alertmanager_spec file :

    $ cat <<EOF>alertmanager_spec
    ---
    service_type: alertmanager
    service_id: alertmanager
    service_name: alertmanager
    placement:
      count: 3
      label: monitoring
    networks:
    - {storage_network}
    EOF
    

    Replace {storage_network} with the subnet assigned to the storage network.

  2. as root user from the same Controller, run the cephadm shell. Remove the adopted Alertmanager daemons and apply the spec created in step 1:

    $ cephadm shell -m alertmanager_spec
    $ ceph orch rm alertmanager
    $ ceph orch apply -i /mnt/alertmanager_spec
    
  3. verify for running alertmanager instances using ceph orch ps and exit the cephadm shell

    $ ceph orch ps
    
  4. finally, restart haproxy bundle:

    pcs resource enable haproxy-bundle
    

See also Reinitializing Ceph Alertmanager and/or RGW - Upgrading Ceph

Root Cause

The HAProxy bundle cannot start through Pacemaker because a failure occurs when it attempts to bind to the Alertmanager port (9093). The failure occurs because Alertmanager has not been redeployed on the storage network.

See also Bugzilla 7041333: Cephadm attempts to bind Grafana to all (::) interfaces when IPv6 networks list is provided.

Artifacts

Product/VersionRelated BZ/JiraErrataFixed Version
RHCS/8Bugzilla This content is not included.2274719Errata RHSA-2025:97758.1 - 8.1
RHCS/8Bugzilla This content is not included.2356569Errata TBD8.0z4 - 8.0.4
RHCS/7Bugzilla This content is not included.2356570Errata TBD7.1z7 - 7.1.7
RHCS/6Bugzilla This content is not included.2356571Errata TBD6.1z10 - 6.1.10
RHCS/8Bugzilla This content is not included.2356355Errata RHSA-2025:97758.1 - 8.1
RHCS/8Bugzilla This content is not included.2356550Errata TBD8.0z4 - 8.0.4
RHCS/7Bugzilla This content is not included.2356551Errata TBD7.1z7 - 7.1.7
RHCS/6Bugzilla This content is not included.2356553Errata TBD6.1z10 - 6.1.10
RHCS/5Bugzilla This content is not included.2356354Errata TBD5.3z9 - 5.3.9
RHCS/5Bugzilla This content is not included.2269009Errata RHBA-2025:14785.3z8 - 5.3.8
RHCS/5Bugzilla This content is not included.2224351Errata RHBA-2023:47605.3z5 - 5.3.5
RHOSP/17Bugzilla This content is not included.2229931Errata RHBA-2024:020917.1

Diagnostic Steps

  • In /var/log/containers/stdouts/haproxy-bundle.log, we can see HAproxy failing to start trying to bind {vip}:{Alertmanager_port}:

    2024-03-11T11:41:23.975991313+01:00 stderr F [ALERT] 070/114123 (7) : Starting proxy ceph_alertmanager: cannot bind socket [192.168.3.213:9093]
    
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.