Sever side encryption with RGW multisite configuration might lead to data corruption of multipart objects

Updated

Issue :

As a part of internal testing, we observed an md5 mismatch of replicated objects when testing rgw's server-side encryption in multisite deployments. This data corruption is specific to s3 SSE encrypted multipart uploads and only affects the replicated copy, the original object remains intact.

Issue in detail:

Encryption of multipart uploads requires special handling around the part boundaries because each part is uploaded and encrypted separately. In multisite, objects are replicated in their encrypted form, and multipart uploads are replicated as a single part. As a result, the replicated copy loses its knowledge about the original part boundaries required to decrypt the data correctly which causes this corruption. This corruption is irrecoverable on the replicated copy.

Affected versions:

The bug affects Ceph releases all the way back to Luminous where server-side encryption was first introduced. So, RHCS-6.X, 5.X, 4.X, and 3.X are affected.

How to find the affected objects:

You can identify them with an s3 HeadObject request. The response would include an x-amz-server-side-encryption header, and the ETag header value (with"s removed) would be longer than 32 characters (multipart ETags are in the special form "-").

Do not delete the corrupted replicas, because an active-active multisite configuration would go on to delete the original copy. Deleting the replica copy here can lead to data loss

Mitigation:

We are working on the fix which will only modify the replication logic so that we don’t end up in corruption. It won't repair any objects that have already been replicated incorrectly.

As an immediate solution, multisite users should not use server-side encryption for multipart uploads.

Fix availability:

The team is working on the fix and progress is tracked via

The fix is expected to land in one of the upcoming RHCS-6.X releases.

SBR
Category
Article Type