4.21 Release Notes
Release notes for features and enhancements, known issues, and other important information.
Abstract
Chapter 1. Overview
Red Hat OpenShift Data Foundation is software-defined storage that is optimized for container environments. It runs as an operator on OpenShift Container Platform to provide highly integrated and simplified persistent storage management for containers.
Red Hat OpenShift Data Foundation is integrated into the latest Red Hat OpenShift Container Platform to address platform services, application portability, and persistence challenges. It provides a highly scalable backend for the next generation of cloud-native applications, built on a technology stack that includes Red Hat Ceph Storage, the Rook.io Operator, and NooBaa’s Multicloud Object Gateway technology.
Red Hat OpenShift Data Foundation is designed for FIPS. When running on RHEL or RHEL CoreOS booted in FIPS mode, OpenShift Container Platform core components use the RHEL cryptographic libraries submitted to NIST for FIPS Validation on only the x86_64, ppc64le, and s390X architectures. For more information about the NIST validation program, see Content from csrc.nist.gov is not included.Cryptographic Module Validation Program. For the latest NIST status for the individual versions of the RHEL cryptographic libraries submitted for validation, see This content is not included.Compliance Activities and Government Standards.
Red Hat OpenShift Data Foundation provides a trusted, enterprise-grade application development environment that simplifies and enhances the user experience across the application lifecycle in a number of ways:
- Provides block storage for databases.
- Shared file storage for continuous integration, messaging, and data aggregation.
- Object storage for cloud-first development, archival, backup, and media storage.
- Scale applications and data exponentially.
- Attach and detach persistent data volumes at an accelerated rate.
- Stretch clusters across multiple data-centers or availability zones.
- Establish a comprehensive application container registry.
- Support the next generation of OpenShift workloads such as Data Analytics, Artificial Intelligence, Machine Learning, Deep Learning, and Internet of Things (IoT).
- Dynamically provision not only application containers, but data service volumes and containers, as well as additional OpenShift Container Platform nodes, Elastic Block Store (EBS) volumes and other infrastructure services.
1.1. About this release
Red Hat OpenShift Data Foundation 4.21 is now available. New enhancements, features, and known issues that pertain to OpenShift Data Foundation 4.21 are included in this topic.
Red Hat OpenShift Data Foundation 4.21 is supported on the Red Hat OpenShift Container Platform version 4.21. For more information, see This content is not included.Red Hat OpenShift Data Foundation Supportability and Interoperability Checker.
For Red Hat OpenShift Data Foundation life cycle information, refer This content is not included.Red Hat OpenShift Data Foundation Life Cycle.
Chapter 2. New features
This section describes new features introduced in Red Hat OpenShift Data Foundation 4.21.
2.1. Disaster Recovery
2.1.1. Multi-volume consistency for Disaster Recovery
Multi-volume consistency group support is now fully supported again for both, RBD and CephFS volumes. Consistency groups is used by default for all newly protected applications. Already protected non-CG applications on an upgraded cluster applications should be unprotected and protected again to migrate to consistency groups.
2.1.2. Support for CephFS Volumes Using Non‑Default StorageClasses
OpenShift Data Foundation DR solution supports data replication for CephFS volumes that use multiple or non‑default StorageClasses, such as replica‑2 file storage. This enhancement enables disaster recovery for workloads backed by customized CephFS configurations, including cases where different pools or secrets require separate StorageClass, VolumeSnapshotClass, and VolumeReplicationClass resources.
2.1.3. Enhanced Disaster Recovery Capabilities in the RHACM KubeVirt UI
Disaster Recovery Integration for Virtual Machines
The Red Hat Advanced Cluster Management for Kubernetes (RHACM) KubeVirt UI now integrates OpenShift Data Foundation disaster recovery, allowing users to view VM protection status and initiate DR protection directly from the VM view. This capability is available for both GitOps‑managed and discovered VMs, with disabled actions providing guidance when prerequisites are not met.
Improved Visibility for Failover and Relocate Operations
The RHACM UI now surfaces detailed, step‑by‑step progression for Disaster Recovery failover and relocate workflows using progression data from Ramen. Enhanced modals show each operation phase, status indicators, and any associated errors, improving transparency, troubleshooting speed, and overall observability of DR processes.
2.2. Networking
2.2.1. IPv6 Support for Multus Networks
Multus supports IPv6, allowing customers to configure their Multus networks as IPv4‑only or IPv6‑only. This enhancement improves network flexibility and supports modern dual‑stack deployment needs. Both Multus networks must use the same IP family when multiple networks are configured.
2.2.2. IPv6 Support for External Mode
OpenShift Data Foundation clusters deployed in external mode now support single‑stack IPv6, in addition to IPv4, enabling deployments in environments where IPv6 is required with limited IPv4 availability. This enhancement aligns OpenShift Data Foundation external mode with OpenShift’s IPv6 capabilities and expands deployment flexibility for customers operating in IPv6‑centric infrastructures.
2.3. ARM architecture support in OpenShift Data Foundation
OpenShift Data Foundation components can now run on ARM-based clusters.
2.4. Configurable MON timeout for improved node live migration stability
OpenShift Data Foundation now supports configuring the Ceph MON timeout through a YAML update, enabling clusters to accommodate longer node live‑migration times in environments such as bare metal with OpenShift Virtualization. This enhancement helps prevent unnecessary MON failovers during upgrades or maintenance operations where live migrations may exceed the previous fixed 10‑minute threshold, improving cluster stability and operational reliability.
For more information, see Configuring Ceph monitor failover timeout.
2.5. Health Overview and scoring
A new Health Overview provides a single, summarized health score based on infrastructure checks that impact OpenShift Data Foundation performance and stability. The score starts at 100% and is reduced based on detected issues, with severity‑based deductions for minor, medium, and critical conditions. Automated checks include node‑to‑node latency, MTU validation, disk and NIC utilization, and pod restarts observed within the last 24 hours. The UI displays the overall score, lists failed checks, raises alerts when thresholds are exceeded, and allows users to acknowledge issues to exclude them from the score calculation.
For more information, see Viewing OpenShift Data Foundation infrastructure health.
2.6. Automated detection of stale CephFS subvolumes
A new automated scan helps identify stale CephFS subvolumes that may cause OpenShift Data Foundation clusters to appear full even after volumes are deleted. When stale subvolumes are detected, the system now generates alerts and provides runbook guidance, reducing the need for manual troubleshooting and lowering overall maintenance effort for customers.
Chapter 3. Enhancements
This section describes the major enhancements introduced in Red Hat OpenShift Data foundation 4.21.
3.1. Disaster Recovery
3.1.1. Enhanced protected applications list view for disaster recovery
The Protected Applications page shows managed (ApplicationSet and Subscription) and discovered applications together, providing a complete view of all workloads protected by Disaster Recovery. Multi‑selection is enabled in the list view, laying the groundwork for grouped failover, relocate, and disable‑DR operations for managed applications.
3.1.2. Automatic resource cleanup for discovered VMs during disaster recovery operations
Disaster recovery workflows now automatically remove resources associated with DR‑protected virtual machines during failover or relocate operations. This automation applies only to the discovered virtual machines enrolled for protection through Red Hat Advanced Cluster Management Fleet Virtualization and helps reduce manual steps while improving transition efficiency.
3.2. Multus support for existing OpenShift Data Foundation clusters
OpenShift Data Foundation now supports configuring Multus on existing clusters, enabling customers to introduce public and private storage networks after initial deployment. This enhancement allows administrators to add or expand Network Attachment Definitions (NADs), such as separating client‑facing and replication traffic even when a cluster is already using a single Multus network. This provides greater flexibility for network isolation, performance tuning, and aligning storage traffic with organizational requirements.
For more information, see Enabling Multus support on an existing OpenShift Data Foundation cluster.
3.3. Support for multiple device classes, Ceph pools and StorageClasses for identical disk types
Multiple device classes, Ceph pools, and StorageClasses can now be configured on the same disk type and shared nodes, enabling improved workload isolation, reduced noisy‑neighbor impact, and better support for multi‑tenant cost management. The enhancement also includes corrected OSD device‑class labels and UI updates to simplify configuration.
For more information, see Attaching storage for a new device set by using disks of same type on the same nodes.
3.4. Multicloud Object Gateway
3.4.1. Minimal IAM Support for Multicloud Object Gateway
Minimal IAM support is now available in the Multicloud Object Gateway to provide improved access control for multi‑tenant environments. This enhancement enables administrators to define and manage user‑level permissions more effectively, supporting scenarios where data scientists and other user groups require isolated and governed access to object storage resources. The feature lays the groundwork for more secure, flexible multi‑tenant deployments by ensuring that users interact only with the buckets and operations they are authorized to access.
For more information, see Creating and managing IAM user.
3.4.2. Enhanced Multicloud Object Gateway database backup
Multicloud Object Gateway (MCG) database backup is enhanced to provide more reliable protection against data loss and to complement the high‑availability capabilities. The update adds support for automated scheduled backups with configurable frequency (daily, weekly, or monthly) and retention settings along with on‑demand backup options through the MCG CLI or CRD. The UI now displays the time of the most recent backup and offers advanced configuration for Multicloud Object Gateway backup management. The system supports internal, external, and standalone MCG deployments by automatically detecting or allowing selection of snapshot StorageClasses.
To configure MCG database backup, refer to the deployment procedures for your specific platform.
3.4.3. Developer access to the OpenShift Data Foundation Object Browser
Developer‑level users can now be granted access to the OpenShift Data Foundation Object Browser, enabling them to view and manage only their own object storage resources with restricted permissions. This enhancement supports flexible, role‑appropriate access control and allows development teams to work more independently while preserving administrative boundaries.
For more information, see Accessing the object browser.
3.4.4. Improved Multicloud Object Gateway performance using the secondary database instance
Multicloud Object Gateway (MCG) performance has been improved by directing read‑only database operations to the existing secondary database instance, reducing load on the primary and improving overall responsiveness. By separating read and write activity, the system minimizes the impact of heavy read queries, such as lifecycle and background worker operations on write performance and object I/O. This enhancement leverages the high‑availability secondary instance already deployed in recent versions to deliver more consistent and efficient database behavior.
For more information, see the DB pods section within Responsibilities and resources.
3.4.5. Support for internal mode RGW with Object Browser
The Object Browser now supports browsing, uploading, downloading, and managing RGW object. Operations include listing and creating buckets, navigating objects in a folder‑like view, uploading files or directories, downloading objects, generating presigned URLs, viewing metadata, filtering and sorting results, previewing content, and creating prefixes. Bucket‑level capabilities such as configuring expiration and access policies are also available.
For more information, see Creating and managing buckets using the object browser.
3.5. Performance Plus enabled for Azure deployments
New Azure-based deployments now automatically enable Performance Plus for eligible Standard SSD and premium SSD disks (≥513 GiB) when using the Azure Disk CSI driver. This provides higher IOPS and throughput for OpenShift Data Foundation clusters on Azure and helps to achieve the best possible storage performance.
For more information about how to use, see the Prerequisites in the section Creating OpenShift Data Foundation cluster.
3.6. Monitoring enhancements
Additional NooBaa bucket and replication metrics
Introduces bucket_last_cycle_total_objects_num, bucket_last_cycle_replicated_objects_num, and bucket_last_cycle_error_objects_num, along with expanded bucket‑usage gauges for object and size quotas.
Clearer NooBaa capacity alerting
Adds concise runbooks for the NooBaaSystemCapacityWarning85, 95, and 100 alerts, outlining how capacity is calculated and the required mitigation steps.
For more information, see NooBaaSystemCapacityWarning85, NooBaaSystemCapacityWarning95, and NooBaaSystemCapacityWarning100.
New MDS xattr‑latency alert:
Adds the CephXattrSetLatency alert for elevated setxattr latency, with guidance for evaluating and adjusting MDS resource allocation.
For more information, see CephXattrSetLatency.
3.7. Enhanced vCPU requirements for IBM Z and LinuxONE
OpenShift Data Foundation now automatically detects IBM Z and LinuxONE architectures and applies optimized vCPU requirements specific to these platforms. Deployments on IBM Z and LinuxONE no longer require the same number of vCPUs as other architectures, reducing resource overhead and improving deployment efficiency.
3.8. Taint Handling in OpenShift Data Foundation
Customers using taints and toleration need to update their configuration using the guidance in the knowledgebase article How to add toleration for the "non-ocs" taints to the OpenShift Data Foundation pods.
Chapter 4. Technology previews
This section describes the technology preview features introduced in Red Hat OpenShift Data Foundation 4.21 under Technology Preview support limitations.
Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend using them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
Technology Preview features are provided with a limited support scope, as detailed on the Customer Portal: Technology Preview Features Support Scope.
4.1. Integration of OpenShift Data Foundation knowledge into OpenShift Lightspeed via BYOK
OpenShift Data Foundation knowledge is incorporated directly into OpenShift Lightspeed by providing a container image that embeds Data Foundation documentation as retrieval‑augmented generation (RAG) content through the Bring Your Own Key (BYOK) workflow. This integration improves the accuracy and relevance of responses to Data Foundation–related questions by using the correct in‑cluster documentation version, reducing reliance on external search engines and minimizing the need to switch away from the console when seeking operational guidance.
However, this feature is available only in .z releases where the corresponding CSV includes the updated image. Documentation for this feature will consistently lag by one .z release.
For information about how to configure OpenShift Lightspeed, see the Red Hat OpenShift Lightspeed documentation.
Chapter 5. Developer previews
This section describes the developer preview features introduced in Red Hat OpenShift Data Foundation 4.21.
Developer preview feature is subject to Developer preview support limitations. Developer preview releases are not intended to be run in production environments. The clusters deployed with developer preview features are considered to be development clusters and are not supported through the Red Hat Customer Portal case management system. If you need assistance with developer preview features, reach out to the This content is not included.ocs-devpreview@redhat.com mailing list and a member of the Red Hat Development Team will assist you as quickly as possible based on availability and work schedules.
5.1. Two Nodes Fencing (TNF) Support
Adds initial support for Two Nodes Fencing (TNF), enabling deployment of highly available two‑node OpenShift clusters. This enhancement removes a key adoption barrier by providing a storage path that aligns with OpenShift’s upcoming two‑node capabilities, helping customers benefit from an integrated Red Hat compute‑and‑storage solution.
For more information, see Deploying OpenShift Data Foundation on a Two‑Node OpenShift Cluster with Fencing and DRBD.
5.2. Automatic network fencing for non‑graceful shutdowns with Ceph CSI
This update introduces automatic storage‑level fencing for nodes marked with the Kubernetes out‑of‑service taint during non‑graceful shutdowns. When such a taint is applied, OpenShift Data Foundation ensures volumes attached through Ceph‑CSI are safely fenced from the unreachable node, allowing pods to be rescheduled on healthy nodes without risk of data corruption. This improves high availability and significantly reduces recovery time during unexpected node or zone outages.
For more information, see Configuring Non-Graceful Node Shutdown Handling in OpenShift Data Foundation.
Chapter 6. Bug fixes
This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.21.
6.1. Multicloud Object Gateway
Multicloud Object Gateway core pod restarting due to non‑deterministic StorageClass selection
Previously, the Multicloud Object Gateway reconciler selected a StorageClass by listing all StorageClasses and choosing the first match for the Ceph RBD provisioner. Because the Kubernetes List API does not guarantee ordering, this resulted in non‑deterministic and sometimes incorrect StorageClass selection, causing the
noobaa-core-0pod to restart after a fresh deployment.With this fix, the controller now uses a static, deterministic StorageClass name instead of relying on the unordered list.
As a result, StorageClass selection is consistent, preventing configuration issues and pod restart loops.
Incorrect free‑space calculation for PV Pool
Previously, the PV Pool free‑space calculation was performed per pod instead of averaging usage across all nodes. As a result, the Multicloud Object Gateway could report high usage even when only a single pod exceeded 80%.
With this fix, free space is now calculated as the average used space across all pods in the PV Pool.
Multicloud Object Gateway database (NooBaa DB) high availability failure during upgrade
Previously, the MCG database (NooBaa DB) did not have a default value defined for the PostgreSQL
shared_buffersparameter. As a result, if no value was provided, the database was configured withshared_buffers=0, causing high availability (HA) to fail during upgrade.With this fix, a default value of
shared_buffers=1Gis applied to ensure proper database behavior and prevent HA failures.
Database corruption caused by concurrent PostgreSQL instances during DB pod replacement
Previously, when the database pod was force‑deleted, a new pod started before the old pod’s container had fully stopped. This race condition resulted in two PostgreSQL instances running concurrently on the same data directory and caused database corruption. The HA controller (HAC) in the Multicloud Object Gateway operator was identified as the component triggering the force deletion. In addition, the database PVC used the
ReadWriteOnceaccess mode, which did not prevent concurrent mounts by two containers on the same node.This fix disables the HAC by default. For new deployments, the database PVC now uses the
ReadWriteOncePodaccess mode to prevent concurrent mounting.As a result, the DB pod is no longer force‑deleted by internal components, and new deployments benefit from stronger volume protection to prevent corruption even if the pod is force‑deleted manually.
PVCs in PV Pool backingstores inherited irrelevant CPU and memory limits
Previously, when provisioning a PVC for a PV Pool backingstore, the provisioning logic copied all fields into the PVC template. This caused the PVC to incorrectly inherit CPU and memory limits.
With this fix, only the relevant fields are copied into the PVC template, preventing unintended resource settings.
ThanosRuleHighRuleEvaluationWarnings firing due to incorrect Multicloud Object Gateway metric naming
Previously, some Multicloud Object Gateway metrics did not end with the required
_total,_sum,_count, or_bucketsuffix. As a result, theThanosRuleHighRuleEvaluationWarningsinfo alert continued to fire in the Red Hat OpenShift Container Platform web console.With this fix, the affected metrics now use the appropriate suffixes, preventing this alert from firing for this issue.
6.2. Disaster recovery
Stuck PVs after final sync due to Retain reclaim policy
Previously, after the final sync, temporary PVs/PVCs were deleted, but some PVs remained because their
persistentVolumeReclaimPolicywas set toRetain, preventing cleanup.With this fix, Ramen now properly resolves conflicts during resource updates, ensuring that cleanup is not skipped.
As a result, no PVs remain stuck after failover or final sync.
Certificate rotation race condition caused empty certificate files
Previously, Kubernetes nodes could fail to start after a reboot because certificate files were empty, even though the files and symlinks existed. This occurred due to a race condition during kubelet certificate rotation.
The issue happened because when new certificates were written to disk, the data was not explicitly forced to persist to the physical disk. If a reboot occurred before the OS flushed the buffered data, the certificate files ended up empty.
With this fix, certificate data is now immediately and explicitly written to disk after being generated.
As a result, certificate files remain valid and non‑empty even if certificate rotation occurs during node reboot.
Incorrect MAX AVAIL value shown in
ceph dffor stretch‑mode clustersPreviously, Red Hat Ceph Storage clusters operating in stretch mode displayed an incorrect
MAX AVAILvalue in the output of theceph dfcommand. This resulted in inaccurate reporting of available storage capacity.With this update, OpenShift Data Foundation now correctly computes and reports the
MAX AVAILmetric, ensuring accurate capacity visualization across Ceph pools in stretch‑mode deployments.This fix improves cluster observability and prevents misinterpretation of storage utilization.
6.3. Ceph
OSD crashes caused by BlueStore Elastic Shared Blob extent‑resharding logic
Previously, a bug in BlueStore’s Elastic Shared Blob extent‑resharding logic caused incorrect allocation‑unit (AU) boundary calculations. This triggered the assertion
ceph_assert(diff ⇐ bytes_per_au[pos])during resharding, and resulted in OSD crashes. The issue was fixed upstream and included in the Ceph 8.1z5 downstream branch.With this fix, OSDs now handle BlueStore resharding and object deletion operations without crashing.
As a result, OSDs no longer enter crash loops, and placement groups (PGs) avoid degraded or incomplete states caused by this issue. This bug affected OSDs created with
bluestore_elastic_shared_blobs=1in Squid (19.2.x / Ceph 8.x).
6.4. OCS Operator
Improved Ceph PG autoscaling by removing default target size ratios from DF‑created pools
Previously, Data Foundation set a default
target_size_ratioof0.49on the data pools it created. Over time, it was observed that having target size ratios on pools led to poor PG autoscaling and balancing, causing delays in rebalancing across pools.With this fix, Data Foundation‑created pools no longer use a default target size ratio.
As a result, PG autoscaling is faster and overall pool balancing is improved.
6.5. CSI Addons
Orphaned CSIAddonsNode resources causing errant sidecar connection attempts
Previously, deleting a worker node left behind stale
CSIAddonsNoderesources owned by the DaemonSet on that node. These orphaned resources caused thecsi-addons-controller-managerpod to make repeated, incorrect connection attempts.With this fix, the controller manager now watches for and removes stale
CSIAddonsNoderesources automatically.As a result, no orphaned
CSIAddonsNoderesources remain in the cluster.
6.6. Ceph monitoring
ocs-metrics-exporter and ocs-provider-server pods stuck in Pending state
Previously, the
ocs-metrics-exporterandocs-provider-serverpods did not inherit custom tolerations defined underspec.placement.allin theStorageClustercustom resource, causing the pods to remain in a Pending state.With this fix, tolerations configured under the pod-specific placement keys are correctly applied. Administrators must configure extra tolerations under
spec.placement.metrics-exporterfor theocs-metrics-exporterpod and underspec.placement.api-serverfor theocs-provider-serverpod. Theallplacement key remains reserved for rook-ceph resources.A bug that previously overwrote existing tolerations when adding new ones to
metrics-exporterorapi-serverhas also been resolved.
Incorrect pool statistics used for pool‑quota monitoring
Previously, the
CephPoolQuotaBytesCriticallyExhaustedandCephPoolQuotaBytesNearExhaustionalerts evaluated quota status using incorrect pool statistics, causing the UI to display false warnings.With this fix, these alerts now use the correct pool‑quota values for evaluation.
Chapter 7. Known issues
This section describes the known issues in Red Hat OpenShift Data Foundation 4.18.
7.1. Disaster recovery
Regional-DR is not supported in environments deployed on IBM Power
Regional-DR is not supported in OpenShift Data Foundation environments deployed on IBM Power because ACM 2.15 is not supported on this platform for this release. This impacts both new and upgraded deployments on IBM Power.
CIDR range does not persist in
csiaddonsnodeobject when the respective node is downWhen a node is down, the Classless Inter-Domain Routing (CIDR) information disappears from the
csiaddonsnodeobject. This impacts the fencing mechanism when it is required to fence the impacted nodes.Workaround: Collect the CIDR information immediately after the
NetworkFenceClassobject is created.
DRPCs protect all persistent volume claims created on the same namespace
The namespaces that host multiple disaster recovery (DR) protected workloads protect all the persistent volume claims (PVCs) within the namespace for each DRPlacementControl resource in the same namespace on the hub cluster that does not specify and isolate PVCs based on the workload using its
spec.pvcSelectorfield.This results in PVCs that match the DRPlacementControl
spec.pvcSelectoracross multiple workloads. Or, if the selector is missing across all workloads, replication management to potentially manage each PVC multiple times and cause data corruption or invalid operations based on individual DRPlacementControl actions.Workaround: Label PVCs that belong to a workload uniquely, and use the selected label as the DRPlacementControl
spec.pvcSelectorto disambiguate which DRPlacementControl protects and manages which subset of PVCs within a namespace. It is not possible to specify thespec.pvcSelectorfield for the DRPlacementControl using the user interface, hence the DRPlacementControl for such applications must be deleted and created using the command line.Result: PVCs are no longer managed by multiple DRPlacementControl resources and do not cause any operation and data inconsistencies.
Disabled
PeerReadyflag prevents changing the action to FailoverThe DR controller executes full reconciliation as and when needed. When a cluster becomes inaccessible, the DR controller performs a sanity check. If the workload is already relocated, this sanity check causes the
PeerReadyflag associated with the workload to be disabled, and the sanity check does not complete due to the cluster being offline. As a result, the disabledPeerReadyflag prevents you from changing the action to Failover.Workaround: Use the command-line interface to change the DR action to Failover despite the disabled
PeerReadyflag.
Information about
lastGroupSyncTimeis lost after hub recovery for the workloads which are primary on the unavailable managed clusterApplications that are previously failed over to a managed cluster do not report a
lastGroupSyncTime, thereby causing the trigger of the alertVolumeSynchronizationDelay. This is because when the ACM hub and a managed cluster that are part of the DRPolicy are unavailable, a new ACM hub cluster is reconstructed from the backup.Workaround: If the managed cluster to which the workload was failed over is unavailable, you can still failover to a surviving managed cluster.
MCO operator reconciles the
veleroNamespaceSecretKeyRefandCACertificatesfieldsWhen the OpenShift Data Foundation operator is upgraded, the
CACertificatesandveleroNamespaceSecretKeyReffields unders3StoreProfilesin the Ramen config are lost.Workaround: If the Ramen config has the custom values for the
CACertificatesandveleroNamespaceSecretKeyReffields, then set those custom values after the upgrade is performed.
For discovered apps with CephFS, sync stop after failover
For CephFS-based workloads, synchronization of discovered applications may stop at some point after a failover or relocation. This can occur with a
Permission Deniederror reported in theReplicationSourcestatus.Workaround:
For Non-Discovered Applications
Delete the VolumeSnapshot:
$ oc delete volumesnapshot -n <vrg-namespace> <volumesnapshot-name>
The snapshot name usually starts with the PVC name followed by a timestamp.
Delete the VolSync Job:
$ oc delete job -n <vrg-namespace> <pvc-name>
The job name matches the PVC name.
For Discovered Applications
Use the same steps as above, except
<namespace>refers to the application workload namespace, not the VRG namespace.For Workloads Using Consistency Groups
Delete the ReplicationGroupSource:
$ oc delete replicationgroupsource -n <namespace> <name>
Delete All VolSync Jobs in that Namespace:
$ oc delete jobs --all -n <namespace>
In this case,
<namespace>refers to the namespace of the workload (either discovered or not), and<name>refers to the name of the ReplicationGroupSource resource.
Remove DR option is not available for discovered apps on the Virtual machines page
The Remove DR option is not available for discovered applications listed on the Virtual machines page.
Workaround:
Add the missing label to the DRPlacementControl:
{{oc label drplacementcontrol <drpcname> \ odf.console.selector/resourcetype=virtualmachine \ -n openshift-dr-ops}}Add the
PROTECTED_VMSrecipe parameter with the virtual machine name as its value:{{oc patch drplacementcontrol <drpcname> \ -n openshift-dr-ops \ --type='merge' \ -p '{"spec":{"kubeObjectProtection":{"recipeParameters":{"PROTECTED_VMS":["<vm-name>"]}}}}'}}
DR Status is not displayed for discovered apps on the Virtual machines page
DR Status is not displayed for discovered applications listed on the Virtual machines page.
Workaround:
Add the missing label to the DRPlacementControl:
{{oc label drplacementcontrol <drpcname> \ odf.console.selector/resourcetype=virtualmachine \ -n openshift-dr-ops}}Add the
PROTECTED_VMSrecipe parameter with the virtual machine name as its value:{{oc patch drplacementcontrol <drpcname> \ -n openshift-dr-ops \ --type='merge' \ -p '{"spec":{"kubeObjectProtection":{"recipeParameters":{"PROTECTED_VMS":["<vm-name>"]}}}}'}}
Secondary PVCs are not removed when DR protection is removed for discovered apps
On the secondary cluster, CephFS PVCs linked to a workload are usually managed by the VolumeReplicationGroup (VRG). However, when a workload is discovered using the Discovered Applications feature, the associated CephFS PVCs are not marked as VRG-owned. As a result, when the workload is disabled, these PVCs are not automatically cleaned up and become orphaned.
Workaround: To clean up the orphaned CephFS PVCs after disabling DR protection for a discovered workload, manually delete them using the following command:
$ oc delete pvc <pvc-name> -n <pvc-namespace>
7.2. Multicloud Object Gateway
Unable to create new OBCs using Multicloud Object Gateway
When provisioning an NSFS bucket via ObjectBucketClaim (OBC), the default filesystem path is expected to use the bucket name. However, if path is set in
OBC.Spec.AdditionalConfig, it should take precedence. This behavior is currently inconsistent, resulting in failures when creating new OBCs.
7.3. Ceph
Poor CephFS performance on stretch clusters
Workloads with many small metadata operations might exhibit poor performance because of the arbitrary placement of metadata server pods (MDS) on multi-site Data Foundation clusters.
OSD pods restart during add capacity
OSD pods restart after performing cluster expansion by adding capacity to the cluster. However, no impact to the cluster is observed apart from pod restarting.
Ceph becomes inaccessible and IO is paused when connection is lost between the two data centers in stretch cluster
When two data centers lose connection with each other but are still connected to the Arbiter node, there is a flaw in the election logic that causes an infinite election among Ceph Monitors. As a result, the Monitors are unable to elect a leader and the Ceph cluster becomes unavailable. Also, IO is paused during the connection loss.
Workaround: Shutdown the monitors of any one data zone by bringing down the zone nodes. Additionally, you can reset the connection scores of surviving Monitor pods.
As a result, Monitors can form a quorum and Ceph becomes available again and IOs resumes.
SELinux relabelling issue with a very high number of files
When attaching volumes to pods in Red Hat OpenShift Container Platform, the pods sometimes do not start or take an excessive amount of time to start. This behavior is generic and it is tied to how SELinux relabelling is handled by Kubelet. This issue is observed with any filesystem based volumes having very high file counts. In OpenShift Data Foundation, the issue is seen when using CephFS based volumes with a very high number of files. There are multiple ways to work around this issue. Depending on your business needs you can choose one of the workarounds from the knowledgebase solution https://access.redhat.com/solutions/6221251.
7.4. CSI driver
Sync stops after PVC deselection
When a PersistentVolumeClaim (PVC) is added to or removed from a group by modifying its label to match or unmatch the group criteria, sync operations may unexpectedly stop. This occurs due to stale protected PVC entries remaining in the VolumeReplicationGroup (VRG) status.
Workaround: Manually edit the VRG’s status field to remove the stale protected PVC:
$ oc edit vrg <vrg-name> -n <vrg-namespace> --subresource=status
7.5. OpenShift Data Foundation console
UI shows
WaitOnUserCleanUpeven when automatic cleanup is enabledThe UI incorrectly displays the
WaitOnUserCleanUpstatus even when automatic cleanup is enabled for VMs. This occurs because the UI relies only on thephaseandprogressionfields of theDRPlacementControlto determine cleanup behavior and does not evaluate the more granularAutoCleanupcondition that explicitly indicates automatic cleanup.Workaround: There is no manual workaround required. This state is transient and clears automatically once the
progressionfield advances toCompleted. Manual cleanup should be avoided unless theAutoCleanupcondition and its correspondingreasonin theDRPlacementControlor VRG status indicate otherwise.During automatic cleanup, the UI may briefly present a misleading status, which can cause temporary confusion until the cleanup completes.
DRPlacementControl shows
ProtectionErroreven after successful relocationWhen a relocation completes, the
DRPlacementControlmay continue to display aProtectionErrorstatus. This occurs because theProtectedcondition in the DRPlacementControl status incorrectly reports anErrorstate, even though the relocation has finished (phase: Relocated,progression: Completed).Workaround: No direct workaround is available. Wait until retrying the
NoClusterDataConflictcondition is met.The DR status in the UI remains in the
ProtectionErrorstate until the data conflict is resolved.
UI shows "Unauthorized" error and Blank screen with loading temporarily during OpenShift Data Foundation operator installation
During OpenShift Data Foundation operator installation, sometimes the
InstallPlantransiently goes missing which causes the page to show unknown status. This does not happen regularly. As a result, the messages and title go missing for a few seconds.