OVN fails to configure after reboot during OSP-13 -> OSP-16.1 FFU

Solution Unverified - Updated

Environment

  • FFU RHOSP13 -> RHOSP16

Issue

  • After rebooting on the rhel-8, ovn-dbs container fails to start due to an SELinux denial during the deploy:

    2021-08-24 21:55:39,572 p=18856 u=mistral n=ansible | TASK [Start containers for step 3 using paunch] ********************************
    2021-08-24 21:55:39,573 p=18856 u=mistral n=ansible | Tuesday 24 August 2021  21:55:39 +0530 (0:00:00.105)       0:26:52.938 ******** 
    2021-08-24 21:55:39,981 p=18856 u=mistral n=ansible | changed: [controller01] => {"ansible_job_id": "435645617350.231201", "changed": true, "finished": 0, "results_file": "/root/.ansi
    ble_async/435645617350.231201", "started": 1}
    2021-08-24 21:55:40,034 p=18856 u=mistral n=ansible | TASK [Wait for containers to start for step 3 using paunch] ********************
    2021-08-24 21:55:40,034 p=18856 u=mistral n=ansible | Tuesday 24 August 2021  21:55:40 +0530 (0:00:00.461)       0:26:53.400 ******** 
    2021-08-24 22:59:16,751 p=18856 u=mistral n=ansible | fatal: [controller01]: FAILED! => {"ansible_job_id": "435645617350.231201", "attempts": 1200, "changed": false, "finished": 0, "started": 1}
    2021-08-24 22:59:16,751 p=18856 u=mistral n=ansible | NO MORE HOSTS LEFT *************************************************************
    2021-08-24 22:59:16,752 p=18856 u=mistral n=ansible | PLAY RECAP *********************************************************************
    2021-08-24 22:59:16,753 p=18856 u=mistral n=ansible | controller01        : ok=308  changed=168  unreachable=0    failed=1    skipped=145  rescued=0    ignored=0   
    2021-08-24 22:59:16,753 p=18856 u=mistral n=ansible | Tuesday 24 August 2021  22:59:16 +0530 (1:03:36.719)       1:30:30.119 ******** 
    2021-08-24 22:59:16,753 p=18856 u=mistral n=ansible | =============================================================================== 
    
  • The SELinux denial, as shown in the /var/log/audit/audit.log:

    type=AVC msg=audit(1629830593.634:35262): avc:  denied  { setattr } for  pid=154289 comm="chown" name=".ovnnb_db.db.tmp.lock" dev="sde2" ino=1170249788 scontext=system_u:system_r:container_t:s0:c402,c949 tcontext=system_u:object_r:openvswitch_var_lib_t:s0 tclass=file permissive=0
    
  • One can also check the SELinux labels associated to /var/lib/openvswitch/ovn

    [root@controller-0 ~]# ls -lZ  /var/lib/openvswitch/ovn
    total 38088
    -rw-r-----. 1 root root system_u:object_r:container_file_t:s0            21 Aug 24 20:17 ovnnb-active.conf
    -rw-r-----. 1 root root system_u:object_r:container_file_t:s0        883326 Aug 24 15:42 ovnnb_db.db
    -rw-r-----. 1 root root system_u:object_r:container_file_t:s0       8568547 Aug  4 16:26 ovnnb_db.db.backup5.10.1-64444197
    srwxr-x---. 1 root root system_u:object_r:openvswitch_var_lib_t:s0        0 Feb 22  2019 ovn-northd.205.ctl                     <-----
    srwxr-x---. 1 root root system_u:object_r:openvswitch_var_lib_t:s0        0 Mar  6  2019 ovn-northd.46182.ctl                   <-----
    -rw-r-----. 1 root root system_u:object_r:container_file_t:s0            21 Aug 24 20:17 ovnsb-active.conf
    -rw-r-----. 1 root root system_u:object_r:container_file_t:s0       6170532 Aug 24 19:33 ovnsb_db.db
    -rw-r-----. 1 root root system_u:object_r:container_file_t:s0      23367412 Aug  4 16:26 ovnsb_db.db.backup1.15.1-1164519396
    [root@controller-0 ~]# ls -dlZ  /var/lib/openvswitch/ovn
    drwxr-xr-x. 2 root root system_u:object_r:openvswitch_var_lib_t:s0 4096 Aug 27 11:53 /var/lib/openvswitch/ovn
    
  • In the listing above, note the different type for *ovn-northd.205.ctl* and *ovn-northd.46182.ctl*, as well as the /var/lib/openvswitch/ovn directory.

Resolution

  • This issue will be addressed by This content is not included.BZ 1997351

  • As a current workaround, run the following command on the contoller been upgraded

    $ sudo chcon -R -t container_file_t /var/lib/openvswitch/ovn
    
  • One can avoid leading to failure when above command is ran during upgrade procedure reaches below TASK

    TASK [Wait for containers to start for step 3 using paunch]
    
  • The above task has around 1200 retries once the retry is around 900 hit the chcon command on controller node

  • This will avoid the upgrade to fail or cause any downtime.

Root Cause

  • The RCA is still under investigation and discussed in bugzilla.

Diagnostic Steps

  • The following denial is shown in /var/log/audit/audit.log:

    type=AVC msg=audit(1629830593.634:35262): avc:  denied  { setattr } for  pid=154289 comm="chown" name=".ovnnb_db.db.tmp.lock" dev="sde2" ino=1170249788scontext=system_u:system_r:container_t:s0:c402,c949 tcontext=system_u:object_r:openvswitch_var_lib_t:s0 tclass=file permissive=0
    
  • The following SELinux type is set for /var/lib/openvswitch/ovn directory:

    drwxr-xr-x. 2 root root system_u:object_r:openvswitch_var_lib_t:s0 4096 Aug 27 11:53 /var/lib/openvswitch/ovn
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.