How to copy or migrate a Ceph pool?
Environment
Red Hat Ceph Storage 2 and above
Issue
Sometimes it is necessary to migrate all objects from a pool to another one, especially if it is needed to change parameters that can not be modified on pool. For example, it may be needed to migrate from a replicated pool to an EC pool, change EC profile of the pool, or to reduce the number of PGs (pg_num/pgp_num parameters of the pool).
Resolution
Migrating pools with RBD images:
-
From RHCS 4, we have the feature to move RBD images between different pools within the same cluster. For details, see Block Device Guide
-
Another option is to use rbd export and rbd import. It is recommended to use this feature if workload is only RBD images in place of
rados cppool:
# rbd export volumes/volume-3c4c63e3-3208-436f-9585-fee4e2a3de16 <path of export file>
# rbd import --image-format 2 <path> volumes_new/volume-3c4c63e3-3208-436f-9585-fee4e2a3de16 <-- if importing in new pool
- If in same time export and import to avoid local disk then a pipe can be used :
# rbd export volumes/volume-3c4c63e3-3208-436f-9585-fee4e2a3de16 - | rbd import --image-format 2 - volumes_new/volume-3c4c63e3-3208-436f-9585-fee4e2a3de16
- Important Note: While running export and import there should not be any active IO in the RBD images which are being exported and imported. So it is better to take production down during this pool migration time with the help of rbd export/import.
Other non-recommended methods:
- The simplest method to copy all objects with the
rados cppoolcommand. However, it need to have read only access to the pool during the copy and it comes with some big warning for pools which are used for RBD images :
Warning: Problem with cppool command :
-
The main problem is that
cppooldoes not make a faithful copy of all data because it doesn't preserve the snapshots (and snapshot relatedmetadata).
That means if you copy an RBD pool it will render the images somewhat broken (snaps won't be present and won't work properly). It also doesn't preserve theuser_versionfield that somelibradosusers may rely on. For more information please check Content from www.mail-archive.com is not included.upstream ceph mailing list discussion. -
Earlier there were plans to Content from github.com is not included.remove cppool command because of its incompleteness but later it got changed to Content from github.com is not included.rados: refuse to cppool if there are snaps; warn about user_version but this would be part of future releases.
$ ceph osd pool create <new-pool> <pg_num> [ <other new pool parameters> ]
$ rados cppool <source-pool> <new-pool>
$ ceph osd pool rename <source-pool> <source-pool-another-name>
$ ceph osd pool rename <new-pool> <source-pool>
-
You can use
ceph osd pool delete <source-pool>to delete the source pool instead of renaming it. This does not work in all cases. For example, copying EC pools will return an errorerror copying pool srcpool => newpool: (95) Operation not supported. -
You could use
cppoolonly if you do not haverbdimages and its snaps anduser_versionconsumed bylibrados. -
Another method is using
rados exportandrados importcommands and a temporary local directory to save exported data. Butrados exportandrados importhave issues as they are not fully functional until Content from tracker.ceph.com is not included.ceph upstream feature #9964 with Content from github.com is not included.ceph upstream master branch pull request #4863 meansrados exportandrados importare fully functional inJewelrelease (Red Hat Ceph Storage 2.0) not inhammer(Red Hat Ceph Storage 1.3.2).
$ ceph osd pool create <new-pool> <pg_num> [ <other new pool parameters> ]
$ rados export --create <source-pool> <temp-dir>
$ rados import <temp-dir> <new-pool>
After the export/import you must stop all IO to the source pool and redo a synchronisation of modified objects:
$ rados export --workers 5 <source-pool> <temp-dir>
$ rados import --workers 5 <temp-dir> <new-pool>
- WARNING: The thing you need to take care is that while creating a new pool you should check that you have sufficient space in a cluster as it will also have 2 or 3 (or more) copies of data as per pool replication factor. You can use
rados dforceph dfcommand, as they include all the copies size.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.