[RHV] The hosted-engine deploy (restore-from-file) fails if any non-management logical network is defined as a required in backup file.

Solution Verified - Updated

Environment

  • Red Hat Virtualization 4.x

Issue

  • The hosted-engine deployment fails with the below error:-

      2019-03-07 20:33:50,711+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}
    

Resolution

  • This issue was resolved with a fix in ovirt-ansible-hosted-engine-setup-1.0.21. Upgrade the RHV hypervisor having the ovirt-ansible-hosted-engine-setup package at version greater than or at 1.0.21 which is included in 4.3.5 hypervisors. After upgrading, the setup will prompt to pause the deployment if an answer is provided as yes to the question below.

      Pause the execution after adding this host to the 
      engine?
      You will be able to iteratively connect to
      the restored engine in order to manually 
      review and remediate its configuration before 
      proceeding with the deployment:\nplease ensure that 
      all the datacenter hosts and storage domain are 
      listed as up or in maintenance mode before 
      proceeding. This is normally not required when 
      restoring an up to date and coherent backup. 
    
  • The GUI can be accessed manually at this stage and the required networks can be configured for this host or a user can mark it as not required.

  • If upgrading is not possible, a workaround is to create hook fix_network in enginevm_after_engine_setup before deploying the Self-Hosted Engine environment with a backup file:-

  • For RHV 4.2 host, the fix_network hook path is at /usr/share/ovirt-hosted-engine-setup/ansible/hooks/enginevm_after_engine_setup/fix_network.yml.

  • For RHV 4.3 host, the fix_network hook path is at /usr/share/ansible/roles/ovirt.hosted-engine-setup/hooks/enginevm_after_engine_setup/fix_network.yml.

  • For RHV 4.4 SP1 host, the fix_network hook path is at /usr/share/ansible/collections/ansible_collections/redhat/rhv/roles/hosted_engine_setup/hooks/enginevm_after_engine_setup/fix_network.yml.

  • Add the below content in fix_network.yml, replace required_network with the actual required network which is missing causing the host to go non-operational, also replace the data_center and cluster names with the actual names provided in the deployment.

      - include_tasks: auth_sso.yml
      - name: Wait for the engine to reach a stable condition
        wait_for: timeout=300
      - name: fix network
        ovirt_network:
           auth: "{{ ovirt_auth }}"
           name: "{{ item }}"
           data_center: Default
           clusters:
              - name: Default
                required: False
        with_items:
           - "require_network_1"
           - "require_network_2"
    
  • The play will run after the engine-setup, wait for 5 minutes for the engine to initialize and disable the required parameter from the networks mentioned so that host will not go non_operational.
    Note the require_network_name must have a double quote " around the name to prevent Ansible from removing special characters like _.

Root Cause

This issue is tracked in This content is not included.Bug 1686575.

Diagnostic Steps

  • From /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-*.log file, error is as follows:-

      2019-03-07 20:33:50,711+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}
    
  • From engine logs, /var/log/ovirt-hosted-engine-setup/engine-logs-* file, error is as follows:-

      2019-03-07 20:33:42,342+05 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] Host '<hostname>' is set to Non-Operational, it is missing the following networks: '<network_name>'
      2019-03-07 20:33:42,397+05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] EVENT_ID: VDS_SET_NONOPERATIONAL_NETWORK(519), Host <hostname> does not comply with the cluster Default networks, the following networks are missing on host: '<network_name>'
    
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.