4.18 Release Notes

Red Hat OpenShift Data Foundation 4.18

Release notes for features and enhancements, known issues, and other important release information.

Red Hat Storage Documentation Team

Abstract

The release notes for Red Hat OpenShift Data Foundation 4.18 summarizes all new features and enhancements, notable technical changes, and any known bugs upon general availability.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.

Chapter 1. Overview

Red Hat OpenShift Data Foundation is software-defined storage that is optimized for container environments. It runs as an operator on OpenShift Container Platform to provide highly integrated and simplified persistent storage management for containers.

Red Hat OpenShift Data Foundation is integrated into the latest Red Hat OpenShift Container Platform to address platform services, application portability, and persistence challenges. It provides a highly scalable backend for the next generation of cloud-native applications, built on a technology stack that includes Red Hat Ceph Storage, the Rook.io Operator, and NooBaa’s Multicloud Object Gateway technology.

Red Hat OpenShift Data Foundation is designed for FIPS. When running on RHEL or RHEL CoreOS booted in FIPS mode, OpenShift Container Platform core components use the RHEL cryptographic libraries submitted to NIST for FIPS Validation on only the x86_64, ppc64le, and s390X architectures. For more information about the NIST validation program, see Content from csrc.nist.gov is not included.Cryptographic Module Validation Program. For the latest NIST status for the individual versions of the RHEL cryptographic libraries submitted for validation, see This content is not included.Compliance Activities and Government Standards.

Red Hat OpenShift Data Foundation provides a trusted, enterprise-grade application development environment that simplifies and enhances the user experience across the application lifecycle in a number of ways:

  • Provides block storage for databases.
  • Shared file storage for continuous integration, messaging, and data aggregation.
  • Object storage for cloud-first development, archival, backup, and media storage.
  • Scale applications and data exponentially.
  • Attach and detach persistent data volumes at an accelerated rate.
  • Stretch clusters across multiple data-centers or availability zones.
  • Establish a comprehensive application container registry.
  • Support the next generation of OpenShift workloads such as Data Analytics, Artificial Intelligence, Machine Learning, Deep Learning, and Internet of Things (IoT).
  • Dynamically provision not only application containers, but data service volumes and containers, as well as additional OpenShift Container Platform nodes, Elastic Block Store (EBS) volumes and other infrastructure services.

1.1. About this release

Red Hat OpenShift Data Foundation 4.18 (RHSA-2025:2652) is now available. New enhancements, features, and known issues that pertain to OpenShift Data Foundation 4.18 are included in this topic.

Red Hat OpenShift Data Foundation 4.18 is supported on the Red Hat OpenShift Container Platform version 4.18. For more information, see This content is not included.Red Hat OpenShift Data Foundation Supportability and Interoperability Checker.

For Red Hat OpenShift Data Foundation life cycle information, refer to This content is not included.Product Life Cycles.

Chapter 2. New features

This section describes new features introduced in Red Hat OpenShift Data Foundation 4.18.

2.1. Deploying OpenShift Data Foundation with Red Hat OpenShift Service on AWS with hosted control planes

OpenShift Data Foundation is now available for deployment with Red Hat OpenShift Service on AWS (ROSA) hosted control planes (HCP).

For more information, see Deploying OpenShift Data Foundation using Red Hat OpenShift Service on AWS with hosted control planes.

2.2. Disaster recovery solution

2.2.1. Awareness of replication delays ahead of failover or relocation

Non-blocking warnings are displayed for synchronization delays during failover or relocation operations for both discovered and managed applications. This helps to be aware of the replication delays due to any replication issues and replication delay until the initial sync is complete.

For more information, see Subscription-based application failover between managed clusters.

2.2.2. More recipe capabilities for RBD-based applications

The capabilities of recipes are enhanced to support many more RBD-based applications.

2.3. Scaling storage using multiple device classes in the same cluster for local storage deployments

Red Hat OpenShift Data Foundation supports the use and segregation of different disk types. These disk types can be segregated into different device classes and exposed as separate storage classes.

This way, it is possible to control which workloads receive which local storage with their specific disk performance.

Note

Red Hat OpenShift Data Foundation only supports flash disks.

Different sets of disks can be used in the same cluster (local storage), which provides the flexibility to use multiple device classes in the same cluster.

For more information, see Scaling storage using multiple device class in the same cluster for local storage deployments.

2.4. Reduction in the time required to upgrade Red Hat OpenShift Data Foundation

Optimization done to reduce the time required to upgrade Red Hat OpenShift Data Foundation in OpenShift clusters with a significant number of nodes. The optimization takes into account the number of nodes and cluster configuration to better behave during upgrade. This helps to upgrade more than one CSI RBD or CephFS plugin pod at a time during the upgrade.

For more information, see the Prerequisites section in Updating Red Hat OpenShift Data Foundation 4.17 to 4.18.

2.5. Versioning for Multicloud Object Gateway Bucket Replication

Versioning in Multicloud Object Gateway (MCG) enables object data collaboration between two different locations or replication of object data from NooBaa on-prem to NooBaa on AWS, NooBaa to NooBaa, on any platform. This allows opting in for synchronizing versioning.

For more information, see Synchronizing versions in Multicloud Object Gateway bucket replication.

2.6. Bucket notification in Multicloud Object Gateway

Bucket notifications allow for a performant data pipeline creation. With this, it becomes easy to create a data flow where newly ingested data can be immediately detected and further processed. This is specifically important in AI/ML use cases.

For more information, see Bucket notification in Multicloud Object Gateway.

2.7. Multicloud Object Gateway object browser

Bucket content can quickly be browsed, uploaded and downloaded using the MCG object browser inside the OpenShift console. This provides a simplified way to browse the MCG object storage and avoids using a third party tool.

For more information, see Creating and managing buckets using MCG object browser.

2.8. Support for RADOS namespace for external mode

OpenShift Data Foundation supports RADOS namespace for its external mode clusters. This helps to improve performance in multi-tenants scenarios. By creating RADOS namespaces with restricted access per tenant on cephBlockPools will provide an efficient way of serving RBD storage.

For more information, see Creating an OpenShift Data Foundation Cluster for external Ceph storage system.

2.9. Ceph commands from the CLI tool

Arbitrary ceph CLI commands can be run as part of the ODF CLI tool, mainly for guided troubleshooting using the Red Hat OpenShift Data Foundation documentation.

Note

Generally, configuration changes using ceph commands directly are not supported without explicit instructions from Red Hat support.

Chapter 3. Enhancements

This section describes the major enhancements introduced in Red Hat OpenShift Data foundation 4.18.

3.1. Specifying Multus network address ranges manually

When using Multus with piecewise CIDRs, multiple address ranges that were added to Rook can be specified manually. This helps to overcome the limitation with the auto-detection that finds only a single CIDR in environments with piecewise CIDRs, which fail to start or fault to connect to the network.

For more information, see Multus network address space sizing.

3.2. Key Rotation for encryption with KMS

Enabling key rotation for encryption keys of cluster-wide KMS is now supported. This helps to meet the common security practices requirement.

For more information, see Cluster-wide encryption.

3.3. Option to disable Key Rotation for PV encryption

Key Rotation for PV encryption, which is enabled by default, can be disabled for certain persistent volume claims (PVCs).

For more information, see Disabling key rotation.

3.4. Option to disable default ReclaimSpace

ReclaimSpace is enabled by default through a StorageClass or Namespace annotation. Reclaim space for certain persistent volume claims (PVCs) can be disabled as the process of reclaiming space (fstrim) can impact performance.

For more information, Disabling reclaim space for a specific PersistentVolumeClaim.

3.5. In-transit encryption after deployment

In-transit encryption can be enabled or disabled for existing clusters after the deployment. This encrypts the communication within the cluster for existing clusters.

For more information, see Enabling and disabling encryption in-transit post deployment.

3.6. Encryption configuration on OpenShift Data Foundation dashboard

OpenShift Data Foundation dashboard provides information about the encryption configuration such as the different status of the encryption at rest and encryption in transit.

3.7. Updates to automatic alert for the type of MDS pod scaling

The MDSCPUUsageHigh alert is updated to notify vertical and horizontal scaling based on the CPU usage.

For more information, see This content is not included.CephMdsCpuUsageHigh.

3.8. A new log level parameter for the StorageCluster CR

A new parameter, spec.nfs.LogLevel, is added to the StorageCluster CR. This log level parameter enables configuring the log level for NFS that provides greater flexibility and control over the logging behavior. This helps to set precise log settings for debugging and monitoring purposes.

3.9. Multicloud Object Gateway operator support 'ap-southeast-5' AWS region

MCG operator supports a new AWS region, ap-southeast-5.

Chapter 4. Removed functionality

This chapter lists functionalities that were supported in Red Hat OpenShift Data Foundation but are no longer available in OpenShift Data Foundation 4.18.

4.1. MDSCacheUsageHigh alert

MDSCacheUsageHigh alert has been removed. This alert was querying rss to send high MDS cache usage warning. However,rss is not the right metrics for this. mds_co_bytes is the correct metric but Ceph does not expose the metric. As a result, wrong alerts were being triggered. The alert has been removed until a better solution is identified.

Chapter 5. Technology previews

This section describes the technology preview features introduced in Red Hat OpenShift Data Foundation 4.18 under Technology Preview support limitations.

Important

Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend using them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

Technology Preview features are provided with a limited support scope, as detailed on the Customer Portal: Technology Preview Features Support Scope.

5.1. Multi-Volume Consistency for Backup - CephFS and block

Multi-volume consistency which provides crash consistent multi-volume consistency groups for backup solutions can be used by applications that are deployed over multiple volumes. This provides support for OpenShift Virtualization and helps to better support applications.

Red Hat OpenShift Data Foundation is the first storage vendor that implements this new and important CSI feature.

For more information, see the knowledgebase article CephFS VolumeGroupSnapshot in OpenShift Data Foundation.

5.2. More disaster recovery recipe capabilities for CephFS-based applications

The capabilities of disaster recovery recipes are enhanced to support more applications. Support for CephFS based applications is in technology preview status for this release.

Chapter 6. Developer previews

This section describes the developer preview features introduced in Red Hat OpenShift Data Foundation 4.18.

Important

Developer preview feature is subject to Developer preview support limitations. Developer preview releases are not intended to be run in production environments. The clusters deployed with the developer preview features are considered to be development clusters and are not supported through the Red Hat Customer Portal case management system. If you need assistance with developer preview features, reach out to the This content is not included.ocs-devpreview@redhat.com mailing list and a member of the Red Hat Development Team will assist you as quickly as possible based on availability and work schedules.

6.1. Consistent RADOS block device (RBD) group disaster recovery

The OpenShift Data Foundation Disaster Recovery solution provides a way to consistently mirror multiple ReadWriteOnce (RWO) persistent volumes (PVs) with regional disaster recovery.

For more information, see the knowledgebase article, Enabling and Managing Consistency Groups in OpenShift 4.18.

Chapter 7. Bug fixes

This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.18.

7.1. Disaster recovery

  • Volsync in DR dashboard reports operator degraded

    Previously, Red Hat Advanced Cluster Management for Kubernetes (RHACM) 2.13 deployed the Volsync operator on a managed cluster without creating the ClusterServiceVersion (CSV) custom resource (CR). As a result, OpenShift did not generate csv_succeeded metrics for Volsync and hence the ODF-DR dashboard did not display the health status of the Volsync operator.

    With this fix, for Volsync, the csv_succeeded metric is replaced with kube_running_pod_ready. Therefore, the RHACM metrics whitelisting ConfigMap is updated and the ODF-DR dashboard is able to monitor the health of the Volsync operator effectively.

    (This content is not included.DFBUGS-1293)

  • Replication using Volsync requires PVC to be mounted before PVC is synchronized

    Previously, a PVC which was not mounted would not be synced to the secondary cluster.

    With this fix, ODF-DR syncs the PVC even when it is not part of the PVCLabelSelector.

    (This content is not included.DFBUGS-580)

7.2. Multicloud Object Gateway

  • Attempting to delete a bucketclass or OBC that do not exist does not result in an error in MCG CLI

    Previously, an attempt to delete a bucketclass or object bucket claim (OBC) that does not exist using the MCG CLI did not result in an error.

    With this fix, error messages on CLI deletion of bucketclasses and OBCs are improved.

    (This content is not included.DFBUGS-201)

  • 502 Bad Gateway observed on s3 get operation: noobaa is throwing error at 'MapClient.read_chunks: chunk ERROR Error: had chunk errors chunk

    Previously, the object was corrupted due to a race condition within MCG between a canceled part of an upload and the dedup flow finding a match. The said part would be flagged as a duplicate and then canceled and reclaimed leaving the second duped part pointing to a reclaimed data which is no longer valid.

    With this fix, deduping with chunks that are not yet marked as finished uploads is avoided and a time buffer of an hour is added after completion to ensure chunks are alive and can be deduped into. This behavior may impact performance for highly deduplicated workloads or benchmark testing during this buffer period. Additionally, increased storage utilization may be observed due to the buffer window, during which some potential objects have not yet been deduplicated.

    (This content is not included.DFBUGS-216)

  • Namespace store stuck in rejected state

    Previously, during monitoring of NSStore when MCG tries to verify access and existence of the target bucket, certain errors were not ignored even though they should have been ignored.

    With this fix, issue report on read-object_md is prevented when the object does not exist.

    (This content is not included.DFBUGS-700)

  • Updating bucket quota always result in 1PB quota limit

    Previously, MCG bucket quota resulted in a 1PB quota limit regardless of the desired value.

    With this fix, the correct value is set on the bucket quota limit.

    (Content from https is not included.DFBUGS-1173)

  • Using PutObject via boto3 >= 1.36.0 results in InvalidDigest error

    Previously, PUT requests with clients that used the upgraded AWS SDK or CLI resulted in error because AWS SDK or CLI changed the default S3 client behavior to always calculate a checksum by default for operations that support it.

    With this fix, the PUT requests from S3 clients are allowed with the changed behavior.

    (This content is not included.DFBUGS-1513)

7.3. Ceph

  • with panic_on_warn set, the kernel ceph fs module panicked in ceph_fill_file_size

    Previously, kernel panic with the note not syncing: panic-on_warn_set occurred due to a specific hard-to-reproduce CephFS scenario.

    With this fix, the RHEL kernel was fixed and as a result, the specific CephFS scenario no longer occurs.

    (This content is not included.DFBUGS-551)

7.4. Ceph container storage interface (CSI) operator

  • ceph-csi-controller-manager pods OOMKilled

    Previously, ceph-csi-controller-manager pods were OOMKilled because these pods tried to cache all configmaps in the cluster on installing OpenShift Data Foundation.

    With this fix, the cache is scoped only to the namespace where ceph-csi-controller-manager pod is running. As a result, memory usage by pods is stable and pods are not OOMKilled.

    (This content is not included.DFBUGS-938)

7.5. OCS Operator

  • rook-ceph-mds pods scheduled on the same node as placement anti-affinity is preferred, not required

    Previously, MDS pods for an active MDS daemon could be scheduled in the same failure domain, as MDS pods had preferred pod anti-affinity.

    With this fix, for activeMDS = 1, required anti-affinity is applied. For activeMDS > 1, preferred anti-affinity remains. As a result when activeMDS = 1, the two MDS pods of the active daemon will have required anti-affinity, ensuring they are not scheduled in the same failure domain and when activeMDS >1, the anti affinity will be preferred and MDS active and standby pair can be scheduled on the same nodes.

    (This content is not included.DFBUGS-1509)

7.6. OpenShift Data Foundation console

  • Tooltip rendered behind other components

    Previously, when graphs or charts were hovered over, tooltips were hidden behind the graphs or charts and the values were not visible (on the dashboards). This was due to the PatternFly v5 library issue.

    With this fix, PatternFly is updated to a minor version and as a result, tooltips are clearly visible.

    (This content is not included.DFBUGS-156)

  • BackingStore details shows incorrect provider

    Previously, the BackingStore details page showed incorrect provider due to the incorrect mapping of the provider name.

    With this fix the UI logic was updated to display the provider name correctly.

    (This content is not included.DFBUGS-353)

  • Error message that Popup fail to alert on rule

    Previously, OBCs could be created with the same name in different namespaces without being notified, which led to potential conflicts or unintended behavior. This was because the user interface did not track object bucket claims (OBCs) across namespaces. This allowed duplicate OBC names without a proper warning.

    With this fix, the validation logic is updated to properly check and notify when you attempt to create an OBC with a duplicate name. A clear warning is displayed if an OBC with the same name exists, preventing confusion and ensuring correct behavior.

(This content is not included.DFBUGS-410)

  • A 404: Not Found message is briefly displayed for a few seconds when clicking on the ‘Enable Encryption’ checkbox during StorageClass creation

    Previously, "404: Not Found" message was briefly displayed for a few seconds while enabling encryption by using the ‘Enable Encryption’ checkbox during new StorageClass creation.

    With this fix, the conditions that caused the issue was fixed. As a result, "404: Not Found" and directly the configuration form is displayed after some loading state.

    This content is not included.DFBUGS-489

  • Existing warning alert "Inconsistent data on target cluster" does not go away

    Previously, when an incorrect target cluster is selected for failover/relocate operations, the existing warning alert "Inconsistent data on target cluster" did not disappear.

    With this fix, the warning alert is refreshed correctly when changing the target cluster for subscription apps. As a result, the alerts no longer persists unnecessarily when failover/relocation is triggered for discovered applications.

    (This content is not included.DFBUGS-866)

7.7. Rook

  • rook-ceph-osd-prepare-ocs-deviceset pods produce duplicate metrics

    Previously, alerts were raised from kube-state-metrics because of the duplicate tolerations in the OSD prepare pods.

    With this fix, the completed OSD prepare pods that had duplicate tolerations are removed. As a result, duplicate alerts with upgrades are no longer raised.

    (This content is not included.DFBUGS-839)

7.8. Ceph monitoring

  • Prometheus rule evaluation errors

    Previously, a lot of PrometheusRuleFailures error logs and the affected alerts were not triggered because many alerts or rules queries which included the metric ceph_disk_occupation had a wrong or invalid label.

    With this fix, the erroneous label was corrected and the queries of the affected alerts were updated. As a result, prometheus rule evaluation is appropriate and all alerts are successfully deployed.

    (This content is not included.DFBUGS-789)

  • Alert "CephMdsCPUUsageHighNeedsVerticalScaling" not triggered when MDS usage is high

    Previously, ocs-operator was unable to read or deploy the malformed rule file and the alerts associated with this file were not visible. This was due to the wrong indentation of the PrometheusRule file, prometheus-ocs-rule.yaml.

    With this fix, the indentation is corrected and as a result, the PrometheusRule file is deployed successfully.

    (This content is not included.DFBUGS-951)

Chapter 8. Known issues

This section describes the known issues in Red Hat OpenShift Data Foundation 4.18.

8.1. Disaster recovery

  • Regional-DR upgrade with multipath devices or partitioned disks from v4.17 to v4.18 fails

    Regional-DR environments with multipath devices or partitioned disks should not upgrade from v4.17 to v4.18 due to known issues with Ceph. The issue will be fixed in 4.18 z-streams or a future release.

    (This content is not included.DFBUGS-1801)

  • Disaster Recovery is misconfigured after upgrade from v4.17.z to v4.18

    When ODF Multicluster Orchestrator and Openshift DR Hub Operator are upgraded from 4.17.z to 4.18, some of the Disaster Recovery resources are misconfigured in internal mode deployments. This impacts Disaster Recovery of workloads using ocs-storagecluster-ceph-rbd and ocs-storagecluster-ceph-rbd-virtualization StorageClasses.

    To workaround this, issue, follow the instructions in this knowledgebase article.

    (This content is not included.DFBUGS-1804)

  • ceph df reports an invalid MAX AVAIL value when the cluster is in stretch mode

    When a crush rule for a Red Hat Ceph Storage cluster has multiple "take" steps, the ceph df report shows the wrong maximum available size for the map. The issue will be fixed in an upcoming release.

    (This content is not included.DFBUGS-1748)

  • Both the DRPCs protect all the persistent volume claims created on the same namespace

    The namespaces that host multiple disaster recovery (DR) protected workloads, protect all the persistent volume claims (PVCs) within the namespace for each DRPlacementControl resource in the same namespace on the hub cluster that does not specify and isolate PVCs based on the workload using its spec.pvcSelector field.

    This results in PVCs that match the DRPlacementControl spec.pvcSelector across multiple workloads. Or, if the selector is missing across all workloads, replication management to potentially manage each PVC multiple times and cause data corruption or invalid operations based on individual DRPlacementControl actions.

    Workaround: Label PVCs that belong to a workload uniquely, and use the selected label as the DRPlacementControl spec.pvcSelector to disambiguate which DRPlacementControl protects and manages which subset of PVCs within a namespace. It is not possible to specify the spec.pvcSelector field for the DRPlacementControl using the user interface, hence the DRPlacementControl for such applications must be deleted and created using the command line.

    Result: PVCs are no longer managed by multiple DRPlacementControl resources and do not cause any operation and data inconsistencies.

    (This content is not included.DFBUGS-1749)

  • MongoDB pod is in CrashLoopBackoff because of permission errors reading data in cephrbd volume

    The OpenShift projects across different managed clusters have different security context constraints (SCC), which specifically differ in the specified UID range and/or FSGroups. This leads to certain workload pods and containers failing to start post failover or relocate operations within these projects, due to filesystem access errors in their logs.

    Workaround: Ensure workload projects are created on all managed clusters with the same project-level SCC labels, allowing them to use the same filesystem context when failed over or relocated. Pods will no longer fail post-DR actions on filesystem-related access errors.

    (This content is not included.DFBUGS-1750)

  • Disaster recovery workloads remain stuck when deleted

    When deleting a workload from a cluster, the corresponding pods might not terminate with events such as FailedKillPod. This might cause delay or failure in garbage collecting dependent DR resources such as the PVC, VolumeReplication, and VolumeReplicationGroup. It would also prevent a future deployment of the same workload to the cluster as the stale resources are not yet garbage collected.

    Workaround: Reboot the worker node on which the pod is currently running and stuck in a terminating state. This results in successful pod termination and subsequently related DR API resources are also garbage collected.

    (This content is not included.DFBUGS-325)

  • Regional DR CephFS based application failover show warning about subscription

    After the application is failed over or relocated, the hub subscriptions show up errors stating, "Some resources failed to deploy. Use View status YAML link to view the details." This is because the application persistent volume claims (PVCs) that use CephFS as the backing storage provisioner, deployed using Red Hat Advanced Cluster Management for Kubernetes (RHACM) subscriptions, and are DR protected are owned by the respective DR controllers.

    Workaround: There are no workarounds to rectify the errors in the subscription status. However, the subscription resources that failed to deploy can be checked to make sure they are PVCs. This ensures that the other resources do not have problems. If the only resources in the subscription that fail to deploy are the ones that are DR protected, the error can be ignored.

    (This content is not included.DFBUGS-253)

  • Disabled PeerReady flag prevents changing the action to Failover

    The DR controller executes full reconciliation as and when needed. When a cluster becomes inaccessible, the DR controller performs a sanity check. If the workload is already relocated, this sanity check causes the PeerReady flag associated with the workload to be disabled, and the sanity check does not complete due to the cluster being offline. As a result, the disabled PeerReady flag prevents you from changing the action to Failover.

    Workaround: Use the command-line interface to change the DR action to Failover despite the disabled PeerReady flag.

    (This content is not included.DFBUGS-665)

  • Ceph becomes inaccessible and IO is paused when connection is lost between the two data centers in stretch cluster

    When two data centers lose connection with each other but are still connected to the Arbiter node, there is a flaw in the election logic that causes an infinite election between the monitors. As a result, the monitors are unable to elect a leader and the Ceph cluster becomes unavailable. Also, IO is paused during the connection loss.

    Workaround: Shutdown the monitors of any one of the data zone by bringing down the zone nodes. Additionally, you can reset the connection scores of surviving mon pods.

    As a result, monitors can form a quorum and Ceph becomes available again and IOs resume.

    (This content is not included.DFBUGS-425)

  • RBD applications fail to Relocate when using stale Ceph pool IDs from replacement cluster

    For the applications created before the new peer cluster is created, it is not possible to mount the RBD PVC because when a peer cluster is replaced, it is not possible to update the CephBlockPoolID’s mapping in the CSI configmap.

    Workaround: Update the rook-ceph-csi-mapping-config configmap with cephBlockPoolID’s mapping on the peer cluster that is not replaced. This enables mounting the RBD PVC for the application.

    (This content is not included.DFBUGS-527)

  • Information about lastGroupSyncTime is lost after hub recovery for the workloads which are primary on the unavailable managed cluster

    Applications that are previously failed over to a managed cluster do not report a lastGroupSyncTime, thereby causing the trigger of the alert VolumeSynchronizationDelay. This is because when the ACM hub and a managed cluster that are part of the DRPolicy are unavailable, a new ACM hub cluster is reconstructed from the backup.

    Workaround: If the managed cluster to which the workload was failed over is unavailable, you can still failover to a surviving managed cluster.

    (This content is not included.DFBUGS-376)

  • MCO operator reconciles the veleroNamespaceSecretKeyRef and CACertificates fields

    When the OpenShift Data Foundation operator is upgraded, the CACertificates and veleroNamespaceSecretKeyRef fields under s3StoreProfiles in the Ramen config are lost.

    Workaround: If the Ramen config has the custom values for the CACertificates and veleroNamespaceSecretKeyRef fields, then set those custom values after the upgrade is performed.

    (This content is not included.DFBUGS-440)

  • virtualmachines.kubevirt.io resource fails restore due to mac allocation failure on relocate

    When a virtual machine is relocated to the preferred cluster, it might fail to complete relocation due to unavailability of the mac address. This happens if the virtual machine is not fully cleaned up on the preferred cluster when it is failed over to the failover cluster.

    Ensure that the workload is completely removed from the preferred cluster before relocating the workload.

    (This content is not included.BZ#2295404)

  • Failover process fails when the ReplicationDestination resource has not been created yet

    If the user initiates a failover before the LastGroupSyncTime is updated, the failover process might fail. This failure is accompanied by an error message indicating that the ReplicationDestination does not exist.

    Workaround:

    Edit the ManifestWork for the VRG on the hub cluster.

    Delete the following section from the manifest:

    /spec/workload/manifests/0/spec/volsync

    Save the changes.

    Applying this workaround correctly ensures that the VRG skips attempting to restore the PVC using the ReplicationDestination resource. If the PVC already exists, the application uses it as is. If the PVC does not exist, a new PVC is created.

    (This content is not included.DFBUGS-632)

  • Ceph in warning state after adding capacity to cluster

    After device replacement or add capacity procedure it is observed that Ceph is in HEALTH_WARN state with mon reporting slow ops. However, there is no impact to the usability of the cluster.

    (This content is not included.DFBUGS-1273)

  • OSD pods restart during add capacity

    OSD pods restarts after performing cluster expansion by adding capacity to the cluster. However, no impact to the cluster is observed apart from pod restarting.

    (This content is not included.DFBUGS-1426)

8.2. Multicloud Object Gateway

  • NooBaa Core cannot assume role with web identity due to a missing entry in the role’s trust policy

    For OpenShift Data Foundation deployments on AWS using AWS Security Token Service (STS), you need to add another entry in the trust policy for noobaa-core account. This is because with the release of OpenShift Data Foundation 4.17, the service account has changed from noobaa to noobaa-core.

    For instructions to add an entry in the trust policy for noobaa-core account, see the final bullet in the prerequisites section of Updating Red Hat OpenShift Data Foundation 4.16 to 4.17.

    (This content is not included.DFBUGS-172)

8.3. Ceph

  • Poor performance of the stretch clusters on CephFS

    Workloads with many small metadata operations might exhibit poor performance because of the arbitrary placement of metadata server (MDS) on multi-site Data Foundation clusters.

    (This content is not included.DFBUGS-1753)

  • SELinux relabelling issue with a very high number of files

    When attaching volumes to pods in Red Hat OpenShift Container Platform, the pods sometimes do not start or take an excessive amount of time to start. This behavior is generic and it is tied to how SELinux relabelling is handled by the Kubelet. This issue is observed with any filesystem based volumes having very high file counts. In OpenShift Data Foundation, the issue is seen when using CephFS based volumes with a very high number of files. There are different ways to workaround this issue. Depending on your business needs you can choose one of the workarounds from the knowledgebase solution https://access.redhat.com/solutions/6221251.

    (This content is not included.Jira#3327)

8.4. CSI Driver

  • Automatic flattening of snapshots is not working

    When there is a single common parent RBD PVC, if volume snapshot, restore, and delete snapshot are performed in a sequence more than 450 times, it is further not possible to take volume snapshot or clone of the common parent RBD PVC.

    To workaround this issue, instead of performing volume snapshot, restore, and delete snapshot in a sequence, you can use PVC to PVC clone to completely avoid this issue.

    If you hit this issue, contact customer support to perform manual flattening of the final restored PVCs to continue to take volume snapshot or clone of the common parent PVC again.

    (This content is not included.DFBUGS-1752)

8.5. OpenShift Data Foundation console

  • Optimize DRPC creation when multiple workloads are deployed in a single namespace

    When multiple applications refer to the same placement, then enabling DR for any of the applications enables it for all the applications that refer to the placement.

    If the applications are created after the creation of the DRPC, the PVC label selector in the DRPC might not match the labels of the newer applications.

    Workaround: In such cases, disabling DR and enabling it again with the right label selector is recommended.

    (This content is not included.DFBUGS-120)

8.6. OCS operator

  • Increasing MDS memory is erasing CPU values when pods are in CLBO state

    When the metadata server (MDS) memory is increased while the MDS pods are in a crash loop back off (CLBO) state, CPU request or limit for the MDS pods is removed. As a result, the CPU request or the limit that is set for the MDS changes.

    Workaround: Run the oc patch command to adjust the CPU limits.

    For example:

    $ oc patch -n openshift-storage storagecluster ocs-storagecluster \
        --type merge \
        --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "3"},
        "requests": {"cpu": "3"}}}}}'

    (This content is not included.DFBUGS-426)

  • Error while reconciling: Service "ocs-provider-server" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated

    From OpenShift Data Foundation 4.18, the ocs-oeprator deploys a service with the port 31659, which might conflict with the existing service nodePort. Due to this any other service cannot use this port if it is already in use. As a result, ocs-oeprator will always error out while deploying the service. This causes the upgrade reconciliation to be stuck.

    Workaround: Replace nodePort to ClusterIP to avoid the collision:

    oc patch -nopenshift-storage storagecluster ocs-storagecluster --type merge -p '{"spec": {"providerAPIServerServiceType": "ClusterIP"}}'

    (This content is not included.DFBUGS-1831)

  • prometheus-operator pod is missing toleration in Red Hat OpenShift Service on AWS (ROSA) with hosted control planes (HCP) deployments

    Due to a known issue during Red Hat OpenShift Data Foundation on ROSA HCP deployment, toleration needs to be manually applied for prometheus-operator after pod creation. To apply the toleration, run the following patch command:

    $ oc patch csv odf-prometheus-operator.v4.18.0-rhodf -n odf-storage --type=json -p='[{"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/tolerations", "value": [
    
    {"key": "node.ocs.openshift.io/storage", "operator": "Equal", "value": "true", "effect": "NoSchedule" }
    ]}]'

    (This content is not included.DFBUGS-1272)

8.7. ODF-CLI

  • ODF-CLI tools misidentify stale volumes

    Stale subvolume CLI tool misidentifies the valid CephFS persistent volume claim (PVC) as stale due to an issue in the stale subvolume identification tool. As a result, stale subvolume identification functionality will not be available till the issue is fixed.

    (This content is not included.DFBUGS-3778)

Chapter 9. Deprecated features

This section describes the deprecated features introduced in Red Hat OpenShift Data foundation 4.18.

9.1. Holder pods in OpenShift Data Foundation Multus

Due to the recurring maintenance impact of holder pods during upgrade (holder pods are present when Multus is enabled), holder pods are deprecated. Because of this, holder pods must be removed before upgrading the cluster to 4.18 or risk PVCs not working correctly. Complete the procedure documented in the article link: Disabling Multus holder pods to disable and remove holder pods. Be aware that this disabling procedure is time consuming, and it is critical to complete the process before ODF is upgraded to 4.18.