Ceph - Steps to convert OSD omap backend from leveldb to rocksdb.
Environment
- Red Hat Ceph Storage 2.4 -
10.2.7-48.el7cp
Issue
- Steps to convert OSD omap backend from leveldb to rocksdb.
Resolution
Steps to convert the OSD filestore omap backend from the default leveldb to rocksdb. These steps will be used on a single OSD at a time not proceeding to the next OSD until the cluster is in an active+clean state.
Following procedure should be used only in Red Hat Ceph Storage 2.4 - 10.2.7-48cp or above versions.
- Set the ‘noout’ flag on the cluster. This will prevent the monitor from marking the OSD as out and prevent any unnecessary backfilling occurring in the cluster.
# ceph osd set noout
- Stop the OSD process.
# systemctl stop ceph-osd@<id>
- Move the current omap directory from the current OSD data directory.
# mv /var/lib/ceph/osd/ceph-<id>/current/omap /var/lib/ceph/osd/ceph-<id>/omap.orig
- Increase the current limit of open files to prepare for the copy process as leveldb and rocksdb use large numbers of files.
# ulimit -n 65535
-
For
ceph-kvstore-toolplease install ceph-test package available in Red Hat Ceph Storage Repositories. The ceph-test package version should be same as ceph-osd package. -
Copy the existing leveldb store into the omap directory as rocksdb store, please note this step will take some time, tens of minutes, to upwards of hours depending on db size.
# ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-<id>/omap.orig store-copy /var/lib/ceph/osd/ceph-<id>/current/omap 10000 rocksdb
- Verify the success of the conversion.
# ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-<id>/current/omap --command check
- Update the OSD superblock to reflect the new rocksdb.
# sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-<id>/superblock
- Change the user and group to ceph.ceph if OSD daemon is running with
cephuser.
# chown ceph.ceph /var/lib/ceph/osd/ceph-<id>/current/omap -R
- Remove the old leveldb omap directory from the OSD data directory.
# cd /var/lib/ceph/osd/ceph-<id>
# rm -rf omap.orig
- Start the OSD.
# systemctl start ceph-osd@<id>
-
Proceed to the next OSD and repeat the process until completion of all OSD’s in the list to be converted as filestore omap backend as rocksdb.
-
Remove the noout flag from the cluster once the final OSD is completed.
# ceph osd unset noout
Adding new OSD in Red Hat Ceph Storage 2.4 with omap backend as rocksdb.
- Please add following option in ceph.conf [global] or [osd] section and then add new OSD's.
filestore_omap_backend = "rocksdb"
-
It is very important to tune rocksdb_cache_size as per your workload and OSD RAM this defaults to 128MB. If the rocksdb cache is not tuned properly you can have performance(latency issue, slow requests) in OSDs.
-
Important Note: If using Red Hat Ceph Storage version 2.4 async -
10.2.7-48cprocksdb_cache_size cannot be used more than 2GB-1 as documented here - Ceph - RocksDB cache size is limited to 2GB -1(2147483647 bytes) in Red Hat Ceph Storage 2.4 async -10.2.7-48.el7cpand before versions, why?.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.