BareMetalHost reference is missing after adding a host to OpenShift Assisted Installer cluster

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform(RHOCP)
    • 4.x in general. Verified with 4.19
  • Original cluster installation: Assisted Installer (AI) or Agent Based Installer (ABI)

Issue

  • In Assisted Installer or Agent-Based Installer cluster Day-2 scenario, unable to see BareMetalHost and machine info for the nodes added using the Add Host option from the This content is not included.console.redhat.com.
  • The machine info and BareMetalHost reference is missing for the manually added node and it is not visible in the machine view on the OpenShift Web Console even though all the nodes in this cluster are physical baremetal hosts.
  • Unable to see resources of BareMetalHost(BMH) and machine objects while adding a new worker node.

Resolution

The missing objects are not necessary for the cluster operations, however, if needed they can be created manually with the following procedure:

Notes:

  • Because the new nodes were not created by the cluster metal3 internal feature, the BareMetalHost objects of these new nodes must contain the attribute externallyProvisioned: true.
  • If available, use existing object as reference, but clean them before re-adding to the cluster.
  • The BMH spec.bmc.address depends on the server vendor and the network architecture. Possible protocols are the old IPMI or the newer Redfish. Please, refer to the documentation for examples.
  • Create an individual secret for each host. The secret is deleted if the BMH object is deleted.
  • Because the ISO installation does not perform the introspection step, the host hardware details are not collected, thus the Inventory section on the OpenShift Web Console will be empty. This is expected. If the inventory details are desired, install the node using the metal3 features.
  1. Create a BareMetalHost(BMH) object for each new node:

1.1. Save in the file bmh-worker1.yaml:

```
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  labels:
    installer.openshift.io/role: worker
  name: worker1
  namespace: openshift-machine-api
spec:
  bmc:
    address: <BMC Board access method>
    credentialsName: worker1-bmc-secret
  bootMACAddress: 52:54:00:0b:bb:88
  externallyProvisioned: true
  hardwareProfile: unknown
  online: true
  userData:
    name: worker-user-data-managed
    namespace: openshift-machine-api
```

1.2. Create the secret for the host:

```
$ oc -n openshift-machine-api create secret generic worker1-bmc-secret \
        --from-literal=username=<BMC_username> \
        --from-literal=password=<BMC_password>
```

1.3. Create the BMH object for the worker1:

```
$ oc create -f bmh-worker1.yaml 
```


The outcome should be:


```
oc -n openshift-machine-api get bmh
NAME                       STATE                    CONSUMER                ONLINE   ERROR   AGE
...
worker1                    externally provisioned                           true             5m40s
```
  1. When the new node BareMetalHost object is registered with the State=externally provisioned, set the providerID attribute of the corresponding node:

2.1. Get the current BMH uid (example):

```
$ oc -n openshift-machine-api get baremetalhosts worker1 -o jsonpath='{.metadata.uid}{"\n"}'
bccbc400-f7c9-40ff-89a6-5c9d8a29d1ee
```

2.2. Patch the node:

```
$ oc patch node worker1 --type merge \
    --patch '{"spec":{"providerID":"baremetalhost:///openshift-machine-api/worker1/<uid from previous command>"}}'
```
  1. Create the Machine objects for the new node. An existing Machine object may be used as a reference:

3.1. Save in the file machine-worker1.yaml:

**Please note the new `cluster-api-machineset=external-workers`. More details on that later.**


```
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    metal3.io/BareMetalHost: openshift-machine-api/worker1
  labels:
    machine.openshift.io/cluster-api-cluster: c41911-2hghd
    machine.openshift.io/cluster-api-machineset: external-workers
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
  name: external-worker1
  namespace: openshift-machine-api
spec:
  providerID: baremetalhost:///openshift-machine-api/worker1/<bmh uid>
  providerSpec:
    value:
      apiVersion: baremetal.cluster.k8s.io/v1alpha1
      hostSelector: {}
      kind: BareMetalMachineProviderSpec
      userData:
        name: worker-user-data-managed
```

3.2. Create the machine:

```
$ oc create -f machine-worker1.yaml 
```

3.3. Patch the BareMetalHost object to set their consumerRef attributes by referencing the newly created machine:

```
$ oc -n openshift-machine-api patch bmh worker1 --type merge --patch '{"spec":{"consumerRef":{"apiVersion":"machine.openshift.io/v1beta1","kind":"Machine","name":"external-worker1","namespace":"openshift-machine-api"}}}'
```

3.4. Add the machine resource annotation to the node:

```
$ oc annotate node worker1 machine.openshift.io/machine='openshift-machine-api/external-worker1'
```

3.5. Patch the machine phase status to be provisioned:

```
$ oc -n openshift-machine-api patch --subresource status machines external-worker1 --type json -p '[{"op": "replace", "path": "/status/phase", "value":"Provisioned"}]'
```

IMPORTANT: Initially, the cluster has only one MachineSet object to be used by the machines that are automatically provisioned by metal3/Ironic. This MachineSet includes installation instructions that are not relevant for external nodes, and may cause conflicts.

A MachineSet is not needed for the external nodes Machines. However, in some use-cases a MachineSet may be needed, for example, to be able to create a MachineHealthCheck object to provide high availability to Virtual Machines running on OpenShift Virtualization.

  1. Optionally, create a new MachineSet and MachineHealthheck

4.1. Save in the file machineset-external-workers.yaml:

```
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: c41911-2hghd
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
  name: external-workers
  namespace: openshift-machine-api
spec:
  replicas: 1  <== Adjust as needed. Scale as needed after creation.
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: c41911-2hghd
      machine.openshift.io/cluster-api-machineset: external-workers
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: c41911-2hghd
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: external-workers
    spec:
      providerSpec:
        value:
          apiVersion: baremetal.cluster.k8s.io/v1alpha1
          hostSelector: {}
          image:
            checksum: ""
            url: ""
          kind: BareMetalMachineProviderSpec
          userData:
            name: worker-user-data-managed
```

4.2. Create the MachineSet:

```
$ oc create -f machinesets-external-workers.yaml
```

4.3. Save in the file machinehealthcheck-external.yaml:

```
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  annotations:
    machine.openshift.io/remediation-strategy: external-baremetal
  name: machine-health-check-external
  namespace: openshift-machine-api
spec:
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: worker
      machine.openshift.io/cluster-api-machine-type: worker
      machine.openshift.io/cluster-api-machineset: external-workers
  unhealthyConditions:
  - type: "Ready"
    timeout: "120s"
    status: "False"
  - type: "Ready"
    timeout: "120s"
    status: "Unknown"
  maxUnhealthy: "1"
  nodeStartupTimeout: "10m"  
```

4.4. Create the MachineHealthCheck object:

```
$ oc create -f machinehealthcheck-external.yaml
```

Root Cause

At the time of writing this article, in a Day 2 scenario, the Assisted Installer does not add a Machine and/or BareMetalHost objects for new added nodes. This is expected behaviour when installing with an ISO and not using the host Baseboard Management Controller (BMC) board.

There is an existing This content is not included.Request for Enhancement to automatically add them.

Diagnostic Steps

  • Verify that the node was successfully added to the cluster:

    $ oc get nodes
    NAME                     STATUS     ROLES                        AGE     VERSION
    master1                  Ready      control-plane,master,worker  19h     v1.30.4
    master2                  Ready      control-plane,master,worker  20h     v1.30.4
    master3                  Ready      control-plane,master,worker  20h     v1.30.4
    worker1                  Ready      worker                       54s     v1.30.4  <== New node.
    
  • Confirm that the baremetalhost and machine objects are missing for the new worker1 node:

    $ oc get baremetalhosts -n openshift-machine-api
    NAME      STATE       CONSUMER                        ONLINE   ERROR              AGE
    master1   unmanaged   master-1                        true                        20h
    master2   unmanaged   master-2                        true                        20h
    master3   unmanaged   master-3                        true                        20h
    
$ oc get machines -n openshift-machine-api
NAME                     PHASE     TYPE   REGION   ZONE                       AGE
master-1                 Running                                              20h
master-2                 Running                                              20h
master-3                 Running                                              20h
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.