Ceph Object Store Tool
A Ceph Storage cluster is generally self-repairing. Maintaining multiple replicas or k+m chunks ensures a very high probability that you will not lose data. A persistent incomplete state does not always mean that an OSD failed. Address networking and other peering issues before attempting to use the ceph-objectstore-tool.
In very rare cases, you may see the failure of multiple OSDs or a peculiar sequence of failures where one or more PG replicas are only on the failed OSDs. Hardware issues, power surges, operator errors or software bugs may cause problems that require you to examine and/or repair an OSD. For example, a pool makes two copies of an object, but allows write operations in a degraded state (e.g., osd pool default size = 2 and osd pool default min size = 1). Placement groups are located on OSDs 1 and 2.
- OSD 1 goes down.
- OSD 2 handles one or more write operations alone (e.g.,
osd pool default min size = 1) - OSD 1 comes back up and re-peers with OSD 2.
- OSD 2 goes down before writing objects to OSD 1. OSD 2's disk is readable, but the OSD won't restart.
- OSD 1 knows there are new objects on OSD 2, but there are no extant copies online.
This will typically manifest as a health error in your storage cluster where the cluster remains persistently in an incomplete state, those PGs are associated to failed OSDs, and may also involve lost or unfound objects. These are the types of scenarios where you may need to use the ceph-objectstore-tool.
NOTE: Linux copy tools such as cp, rsync, and others are not adequate to address placement groups on a failed OSD, because these tools do not take all of the necessary metadata into account. If you copied a placement group to an OSD using the Linux command line, you must delete the copy of the placement group; otherwise, the OSD will crash the next time you restart it.
If you know that an OSD has failed, the ceph-objectstore-tool provides you with the capability to examine, modify or retrieve many aspects of an OSDs data. To find placement groups in an incomplete state on your Ceph Storage cluster, execute:
ceph health detail | grep incomplete
The output should list the placement groups that are incomplete and the Acting Set of OSDs for each placement group. For example:
pg 0.3a is incomplete, acting [18,25,9]
To locate the host for an associated OSD from the Acting Set, execute:
ceph osd find <osdnum>
WARNING: Using the ceph-objectstore-tool is a risky process. All ceph-objectstore-tool commands need to be run as root or with the sudo command. Do not attempt this on a production cluster without engaging Red Hat Ceph Storage support. You could cause irreversible data loss in your cluster. For all commands except import-rados the given ceph-osd daemon MUST NOT be running; otherwise you will receive an "OSD has the store locked" error.
Red Hat Ceph Storage v1.3 only support the ObjectStore interface with FileStore and 'KeyValueStore.' MemStore is not supported. By default, ceph-objectstore-tool defaults its --type <arg> value to filestore, so you will rarely have to use --type on a Red Hat supported Ceph Storage production cluster except when working with leveldb on monitors or OSDs.
Ceph maps objects to placement groups, and placement groups to OSDs. The ceph-objectstore-tool allows you to examine an OSD, a particular placement group or a specific object. Many of the ceph-objectstore-tool features are similar to ceph CLI commands, except that you will be retrieving data directly from an offline (down and out) OSD.
Additional Options
The following options address retrieving data in the case of certain data corruption scenarios. **DO NOT** attempt to use these features on a production cluster without engaging Red Hat Ceph Storage support. You could cause irreversible data loss in your cluster.
The --skip-journal-replay option provides a means of recovering data when the journal is corrupt and data written to the journal wasn't replayed. If Ceph OSDs or the ceph-objectstore-tool cannot access the corrupt journal, you may specify this flag to ignore journal data.
The --skip-mount-omap option provides a means of analyzing data if the leveldb store is corrupt and won't mount. Some options such as --op export will not work if you specify this option.
OSD and PG Operations
Once you've identified the problem OSD(s) and its host, you will need to navigate to the host and assess the PGs mapped to the troubled OSD(s). For example:
ssh <osd-host>
Before using the ceph-objectstore-tool, ensure you stop the OSD you intend to work on.
sudo /etc/init.d/ceph stop osd<num>
The following sections describe the OSD and placement group functions of the ceph-objectstore-tool.
List Placement Groups
To list the placement groups stored on an OSD, execute:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --op list-pgs
The tool should return a list of placement groups stored on the OSD.
Get PG Info
To retrieve information about a particular placement group, execute:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --pgid <pg-id> --op info
Get PG Log
To retrieve a log of operations on a placement group, execute:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --pgid <pg-id> --op log
Remove a PG
Removing a placement group may cause data loss. Exercise this feature with caution. If you have a corrupt placement group on an OSD that prevents peering or the OSD from starting, ensure that you have a valid copy of the placement group on another OSD before removing the placement group. To be safe, you may also export a placement group (i.e., a valid copy on another OSD, a copy on a failed OSD) as a safeguard before removing a placement group.
To remove a placement group, execute:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --pgid <pg-id> --op remove
Export a PG
When you want to copy a placement group, you MUST use `ceph-objectstore-tool --op export`, because Linux `cp` will not copy all of the relevant metadata. To copy a placement group, export it from an OSD to a file. Then you can import it to another OSD from a file.
To export a placement group to a file, execute the following:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --pgid <pg-id> --file /path/to/file --op export
Import a PG
To import a placement group, execute:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --file </path/to/file> --op import
High-level Object Operations
Sometimes problems related to a particular placement group are related to objects within the placement group. The following sections describe the high-level object functions of the `ceph-objectstore-tool`. **IMPORTANT:** Fixing objects could cause unrecoverable data loss. **CONTACT RED HAT CEPH STORAGE SUPPORT BEFORE USING THESE FEATURES**.
List Objects
Each OSD may contain contain 0 to many placement groups with 0 to many objects. To identify the objects within an OSD, execute the following (notice it omits `--pgid`):
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --op list
The tool will output all objects irrespective of their placement group. For example:
["0.1c",{"oid":"rbd_directory","key":"","snapid":-2,"hash":816417820,"max":0,"pool":0,"namespace":""}]
["11.6",{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}]
["11.0",{"oid":"default.region","key":"","snapid":-2,"hash":2589353992,"max":0,"pool":11,"namespace":""}]
["11.0",{"oid":"region_info.default","key":"","snapid":-2,"hash":1130435976,"max":0,"pool":11,"namespace":""}]
Each placement group may contain 0 to many objects. To identify the objects within a placement group, execute the following (notice it includes --pgid):
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --pgid <pgid> --op list
The output will only include the objects of a single placement group. For example, if you specify --pgid 11.0, it will output only objects from placement group 11.0.
["11.0",{"oid":"default.region","key":"","snapid":-2,"hash":2589353992,"max":0,"pool":11,"namespace":""}]
["11.0",{"oid":"region_info.default","key":"","snapid":-2,"hash":1130435976,"max":0,"pool":11,"namespace":""}]
If you know the object ID you are looking for, and want to find the PG it belongs to, you can also specify the object ID.
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --op list <object-id>
For example, if you are looking for a known object ID like default.region from your gateway on the OSD, simply replace <object-id> with default.region.
List Lost Objects
An OSD may have objects marked "lost." To list the "lost" or "unfound" objects, execute:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --op list-lost
To find objects marked lost for a single placement group, specify --pgid. For example:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --pgid <pgid> --op list-lost
If you know the identity of the lost object, specify the object ID. For example:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --op list-lost <object-id>
Fix a PG's Lost Objects
An OSD may have objects marked "lost." To remove the "lost" setting for the lost objects of a placement group, execute:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --op fix-lost
To fix lost objects for a particular placement group, specify the --pgid. For example:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --pgid <pg-id> --op fix-lost
If you know the identity of the lost object you want to fix, specify the object ID. For example:
sudo ceph-objectstore-tool --data-path </path/to/osd> \
--journal-path </path/to/journal> --op fix-lost <object-id>
Low-level Object Operations
Sometimes problems related to a particular placement group are related more specifically to a particular object. The following sections describe the low-level object functions of the `ceph-objectstore-tool`. **IMPORTANT:** Modifying objects could cause unrecoverable data loss. **CONTACT RED HAT CEPH STORAGE SUPPORT BEFORE USING THESE FEATURES**.
Get/Set Bytes
If you are interested in the contents of a specific object, you may get or set bytes on a particular object. Setting bytes on an object could result in unrecoverable data loss. One approach to trying to prevent unrecoverable data loss is to get an object's contents twice. The first get operation is a back-up of the object's contents. The second get operation is the copy you intend to modify and then set.
To get an object's contents, you must first identify the object and it's placement group by listing the objects of the OSD or placement group and identifying the specific object that interests you. The ceph-objectstore-tool requires the JSON payload from the list operation, and you must specify --pgid and the placement group ID that contains the object. For example, if you are looking at the zone_info.default object for your gateway, you might specify something like this to get a back up copy:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
get-bytes > zone_info.default.backup
And this to get a working copy.
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
get-bytes > zone_info.default.working-copy
Once you have your working copy, you may modify the file as needed. To set byte to an object, execute the following:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
set-bytes < zone_info.default.working-copy
Remove an Object
If you remove an object, its contents and references to it will be removed from the placement group. **IMPORTANT:** You cannot re-create the object once you remove it! To remove an object, execute:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
remove
List OMAP
To modify an object map (if present), you will need to list the object map to get its keys first. For example:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
list-omap
Get/Set OMAP Header
Getting the object map header will give you insight into any values associated to keys. To get or set an object map header, you will need the data and journal path, the PG ID and the object. You may also specify an output file. For example:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
get-omaphdr > zone_info.default.omaphdr.txt
To set an object map header, simply reverse the process.
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
set-omaphdr < zone_info.default.omaphdr.txt
Get/Set/Remove OMAP Key
To get or set an object map, you will need the data and journal path, the PG ID, the object and the key in the object map you wish to get. You may also specify an output file. For example:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
get-omap <key> > zone_info.default.omap.txt
To set an object map, reverse the process and specify the key you wish to modify.
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
set-omap <key> < zone_info.default.omap.txt
To remove an object map, specify the key you wish to remove. For example:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
rm-omap <key>
List Attributes
To modify attributes (if present), you will need to list the object attributes to get their keys first. For example:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
list-attrs
Get/Set/Remove Attribute Key
To get or set an object attribute, you will need the data and journal path, the PG ID, the object and the key in the object attribute you wish to get. You may also specify an output file. For example:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
get-attr <key> > zone_info.default.attr.txt
To set an object attribute, reverse the process and specify the key you wish to modify.
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
set-attr <key> < zone_info.default.attr.txt
To remove an object attribute, specify the key you wish to remove. For example:
sudo ceph-objectstore-tool --data-path </path/to/data> \
--journal-path </path/to/journal> --pgid <pgid> \
'{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' \
rm-attr <key>