Building a Ceph Storage Cluster Manually

Updated

Manual Deployment

All Ceph clusters require at least one monitor, and at least as many OSDs as copies of an object stored on the cluster. Bootstrapping the initial monitor(s) is the first step in deploying a Ceph storage Cluster. Monitor deployment also sets important criteria for the entire cluster, such as the number of replicas for pools, the number of placement groups per OSD, the heartbeat intervals, whether authentication is required, and so on. Most of these values are set by default, so it is useful to know about them when setting up your cluster for production.

ceph_manual_install_diagram.png

Monitor Bootstrapping

Bootstrapping a monitor (a Ceph Storage Cluster, in theory) requires a number of things :

  • Unique Indentifier : The fsid is a unique identifier for the cluster, and stands for File System ID from the days when the Ceph Storage Cluster was principally for the Ceph Filesystem. Ceph now supports native interfaces, block devices, and object storage gateway interfaces too, so fsid is a bit of a misnomer.

  • Cluster Name : Ceph clusters have a cluster name, which is a simple string without spaces. The default cluster name is ceph, but you may specify a different cluster name. Overriding the default cluster name is especially useful when you are working with multiple clusters and you need to clearly understand which cluster your are working with.

    For example, when you run multiple clusters in a This content is not included.federated architecture, the cluster name (for example, us-west, us-east) identifies the cluster for the current CLI session.
    Note : To identify the cluster name on the command line interface, specify the Ceph configuration file with the cluster name (for example, ceph.conf, us-west.conf, us-east.conf, and so on). Also see CLI usage (ceph --cluster {cluster-name}).

  • Monitor Name : Each monitor instance within a cluster has a unique name. In common practice, the Ceph Monitor name is the host name (we recommend one Ceph Monitor per host, and no commingling of Ceph OSD daemons with Ceph Monitors). You may retrieve the short hostname with hostname -s.

  • Monitor Map : Bootstrapping the initial monitor(s) requires you to generate a monitor map. The monitor map requires the fsid, the cluster name (or uses the default), and at least one host name and its IP address.

  • Monitor Keyring : Monitors communicate with each other via a secret key. You must generate a keyring with a monitor secret and provide it when bootstrapping the initial monitor(s).

  • Administrator Keyring : To use the ceph CLI tools, you must have a client.admin user. So you must generate the admin user and keyring, and you must also add the client.admin user to the monitor keyring.

The foregoing requirements do not imply the creation of a Ceph configuration file. However, as a best practice, we recommend creating a Ceph configuration file and populating it with the fsid, the mon initial members and the mon host settings at minimum.

You can get and set all of the monitor settings at runtime as well. However, a Ceph configuration file may contain only those settings that override the default values. When you add settings to a Ceph configuration file, these settings override the default settings. Maintaining those settings in a Ceph configuration file makes it easier to maintain your cluster.

The bootstrapping procedure is as follows :

  1. On your initial monitor node, verify you have installed all the packages for ceph-mon and a directory for the Ceph configuration file exists. By default, Ceph uses /etc/ceph/. When you install Ceph, the installer will create the /etc/ceph/ directory automatically. For more details on how to install Ceph, please see the Ceph Installation Guide :

    $ ls /etc/ceph   
    

    Note : Deployment tools may remove this directory when purging a cluster (for example, ceph-deploy purgedata {node-name}, and ceph-deploy purge {node-name}).

  2. Create a Ceph configuration file. By default, Ceph uses ceph.conf, where ceph reflects the cluster name :

    $ sudo touch /etc/ceph/ceph.conf
    
  3. Generate a unique ID (that is, fsid) for your cluster and add the unique ID to your Ceph configuration file :

    $ sudo echo "[global]" > /etc/ceph/ceph.conf
    $ sudo echo "fsid = `uuidgen`" >> /etc/ceph/ceph.conf
    
  4. View your current Ceph configuration file :

    For example :

    $ cat /etc/ceph/ceph.conf
    [global]
    fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
    
  5. Add the initial monitor(s) to your Ceph configuration file :

    $ sudo echo "mon_initial_members = {hostname}[,{hostname}]" >> /etc/ceph/ceph.conf 
    

    For example :

    $ sudo echo "mon_initial_members = node1" >> /etc/ceph/ceph.conf
    
  6. Add the IP address(es) of the initial monitor(s) to your Ceph configuration file and save the file :

    $ echo "mon_host = {ip-address}[,{ip-address}]" >> /etc/ceph/ceph.conf
    

    For example :

    $ echo "mon_host = 192.168.0.120" >> /etc/ceph/ceph.conf
    

    Note : You may use IPv6 addresses too, but you must set the ms bind ipv6 option to true. Please see the This content is not included.Network Configuration Guide for more details.

  7. Create a keyring for your cluster and generate a monitor secret key :

    For example :

    $ sudo ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
    creating /tmp/ceph.mon.keyring
    
  8. Generate an administrator keyring, generate a client.admin user and add the user to the keyring :

    For example :

    $ sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
    creating /etc/ceph/ceph.client.admin.keyring
    
  9. Add the client.admin key to the ceph.mon.keyring :

    For example :

    $ sudo ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
    importing contents of /etc/ceph/ceph.client.admin.keyring into /tmp/ceph.mon.keyring
    
  10. Generate a monitor map using the hostname(s), host IP address(es) and the File System Identifier (FSID). Save it as /tmp/monmap :

    $ monmaptool --create --add {hostname} {ip-address} --fsid {uuid} /tmp/monmap
    

    For example :

    $ monmaptool --create --add node1 192.168.0.120 --fsid a7f64266-0894-4f1e-a635-d0aeaca0e993 /tmp/monmap
    monmaptool: monmap file /tmp/monmap
    monmaptool: set fsid to a7f64266-0894-4f1e-a635-d0aeaca0e993
    monmaptool: writing epoch 0 to /tmp/monmap (1 monitors)
    
  11. Create a default data directory (or directories) on the monitor host(s) :

    $ sudo mkdir /var/lib/ceph/mon/{cluster-name}-{hostname}
    

    For example :

    $ sudo mkdir /var/lib/ceph/mon/ceph-node1
    

    Please see the This content is not included.Monitor Configuration Reference section for more details.

  12. Populate the monitor daemon(s) with the monitor map and keyring :

    $ sudo ceph-mon [--cluster {cluster-name}] --mkfs -i {hostname} --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
    

    For example :

    $ ceph-mon --mkfs -i node1 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
    ceph-mon: set fsid to a7f64266-0894-4f1e-a635-d0aeaca0e993
    ceph-mon: created monfs at /var/lib/ceph/mon/ceph-node1 for mon.node1
    
  13. View your current /etc/ceph/ceph.conf file :

    $ cat /etc/ceph/ceph.conf
    [global]
    fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
    mon_initial_members = node1
    mon_host = 192.168.0.120
    

    Please see the This content is not included.Ceph Configuration Guide for more details on Ceph configuration settings. Consider these other possible settings for a Ceph cluster. Here are some of the most common settings in the /etc/ceph/ceph.conf file :

    [global]
    fsid = {cluster-id}
    mon initial members = {hostname}[, {hostname}]
    mon host = {ip-address}[, {ip-address}]
    public network = {network}[, {network}]
    cluster network = {network}[, {network}]
    auth cluster required = cephx
    auth service required = cephx
    auth client required = cephx
    osd journal size = {n}
    filestore xattr use omap = true
    osd pool default size = {n}  # Write an object n times.
    osd pool default min size = {n} # Allow writing n copy in a degraded state.
    osd pool default pg num = {n}
    osd pool default pgp num = {n}  
    osd crush chooseleaf type = {n}
    
  14. Create the done file :

    Mark that the monitor is created and ready to be started :

    $ sudo touch /var/lib/ceph/mon/ceph-node1/done
    
  15. Start the monitor(s) :

    For Ubuntu, use Upstart :

    $ sudo start ceph-mon id=node1 [cluster={cluster_name}]
    

    In this case, to allow the start of the daemon at each reboot you must create two empty files like this :

    $ sudo touch /var/lib/ceph/mon/{cluster_name}-{hostname}/upstart
    

    For example :

    $ sudo touch /var/lib/ceph/mon/ceph-node1/upstart
    

    For Debian/CentOS/Red Hat Enterprise Linux, use sysvinit script :

    $ sudo touch /var/lib/ceph/mon/{cluster_name}-{hostname}/sysvinit
    $ sudo /etc/init.d/ceph start mon.{hostname}
    

    For example :

    $ sudo touch /var/lib/ceph/mon/ceph-node1/sysvinit
    $ sudo /etc/init.d/ceph start mon.node1
    
  16. Verify that Ceph created the default pools :

    $ sudo ceph osd lspools
    

    You should see output like this :

    0 rbd,
    
  17. Verify that the monitor is running :

    $ sudo ceph -s
    

    You should see output that the monitor you started is up and running, and you should see a health error indicating that placement groups are stuck inactive. It should look something like this :

    cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
    health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
    monmap e1: 1 mons at {node1=192.168.0.120:6789/0}, election epoch 1, quorum 0 node1
    osdmap e1: 0 osds: 0 up, 0 in
    pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
    0 kB used, 0 kB / 0 kB avail
    192 creating
    

    Note : Once you add OSDs and start them, the placement group health errors should disappear. See the next section for details.

Adding OSDs

Once you have your initial monitor(s) running, you can add the OSDs. Your cluster cannot reach an active + clean state until you have enough OSDs to handle the number of copies of an object. The default number of copies is 3. In our example, we will be using only 2 OSDs, and therefore only 2 copies of an object. After bootstrapping your monitor, your cluster has a default CRUSH map; however, the CRUSH map does not have any Ceph OSD daemons mapped to a Ceph node. Verify you have all the required ceph-osd packages installed on your Ceph OSD node(s) before you start. Please see the This content is not included.OSD Configuration Reference for more details. Choose your preferred method of OSD installation: short or long form.

  1. On the monitor node, update the Ceph configuration file :

    $ sudo echo "osd_pool_default_size = 2" >> /etc/ceph/ceph.conf
    $ sudo echo "osd_pool_default_min_size = 1" >> /etc/ceph/ceph.conf
    
  2. Copy the /etc/ceph/ceph.conf file to the OSD node

Short Form

Ceph provides the ceph-disk utility, which prepares a disk, partition or directory for use with Ceph. The ceph-disk utility creates the OSD ID by incrementing the index. Additionally, ceph-disk adds the new OSD to the CRUSH map under the host for you. Execute the ceph-disk -h command for CLI details. The ceph-disk utility automates the steps of the Long Form below. The recommended minimum disk (or partition) size is 10GB. To create the OSDs with the short form procedure, execute the following commands on each OSD node :

  1. On your OSD node(s), prepare the OSD disk :

    $ sudo ceph-disk prepare --cluster {cluster_name} --cluster-uuid {uuid} --fs-type {ext4|xfs|btrfs} {data-path} [{journal-path}]
    

    For example :

    $ sudo ceph-disk prepare --cluster ceph --cluster-uuid a7f64266-0894-4f1e-a635-d0aeaca0e993 --fs-type xfs /dev/sdb
    
  2. Activate the OSD :

    $ sudo ceph-disk activate {data-path} [--activate-key {path}]
    

    For example :

    $ sudo ceph-disk activate /dev/sdb1
    

    Note : Use the --activate-key argument if you do not have a copy of /var/lib/ceph/bootstrap-osd/{cluster}.keyring on the OSD node.

Long Form

Creating and adding an OSD to the cluster and updating the CRUSH map, without the benefit of any helper utilities, do the following procedure. To create the first two OSDs with the long form procedure, execute the following on each OSD node :

  1. On the OSD node(s), create the OSD. If no UUID is given, it will be set automatically when the OSD starts up. The following command outputs the OSD number needed for subsequent steps :

    $ sudo ceph osd create [{uuid} [{id}]]
    

    For example :

    $ uuidgen
    b367c360-b364-4b1d-8fc6-09408a9cda7a
    $ sudo ceph osd create b367c360-b364-4b1d-8fc6-09408a9cda7a
    0
    
  2. Create the default directory for your new OSD :

    $ sudo mkdir /var/lib/ceph/osd/{cluster_name}-{osd_number}
    

    For example :

    $ sudo mkdir /var/lib/ceph/osd/ceph-0
    
  3. Prepare the drive for use as an OSD, and mount it to the directory you just created. Create a partition for the Ceph data and journal. This example is using a 10GB disk :

    $ sudo parted {path-to-disk} mklabel gpt
    $ sudo parted {path-to-disk} mkpart primary 1 10000
    $ sudo mkfs -t {fstype} {path-to-data-partition}
    $ sudo mount -o noatime {path-to-data-partition} /var/lib/ceph/osd/{cluster_name}-{osd_number}
    

    For example :

    $ sudo parted /dev/sdb mklabel gpt
    $ sudo parted /dev/sdb mkpart primary 1 10000
    $ sudo mkfs -t xfs /dev/sdb1
    $ sudo mount -o noatime /dev/sdb1 /var/lib/ceph/osd/ceph-0
    
  4. Initialize the OSD data directory :

    $ sudo ceph-osd -i {osd_number} --mkfs --mkkey --osd-uuid {uuid}
    

    For example :

    $ sudo ceph-osd -i 0 --mkfs --mkkey --osd-uuid b367c360-b364-4b1d-8fc6-09408a9cda7a
    ... auth: error reading file: /var/lib/ceph/osd/ceph-0/keyring: can't open /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
    ... created new key in keyring /var/lib/ceph/osd/ceph-0/keyring
    

    The directory must be empty before you can run ceph-osd with the --mkkey option. If you have a custom cluster name, the ceph-osd tool requires the --cluster option.

  5. Register the OSD authentication key. If your cluster name differs from "ceph", insert your cluster name instead :

    $ sudo ceph auth add osd.{osd_number} osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/{cluster_name}-{osd_number}/keyring
    

    For example :

    $ sudo ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-0/keyring
    added key for osd.0
    
  6. Add your OSD node to the CRUSH map :

    $ sudo ceph [--cluster {cluster-name}] osd crush add-bucket {hostname} host
    

    For example :

    $ sudo ceph osd crush add-bucket node2 host
    
  7. Place the OSD node under the default CRUSH tree :

    $ sudo ceph osd crush move node2 root=default
    
  8. Add the OSD disk to the CRUSH map so that the OSD can begin receiving data. You may also decompile the CRUSH map, and add the OSD to the device list. If the host is not already in the CRUSH map, add the host as a bucket, add the device as an item in the host, assign it a weight, recompile it and set it :

    $ sudo ceph [--cluster {cluster-name}] osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
    

    For example :

    $ sudo ceph osd crush add osd.0 1.0 host=node2
    add item id 0 name 'osd.0' weight 1 at location {host=node2} to crush map
    
  9. After you add an OSD to the Ceph cluster, the OSD is in your configuration. However, it is not yet running. The OSD is down and in. You must start your new OSD before it can begin receiving data :

    For Ubuntu, use Upstart :

    $ sudo start ceph-osd id={osd_number} [cluster={cluster_name}]
    

    For example :

    $ sudo start ceph-osd id=0
    

    For Debian/CentOS/Red Hat Enterprise Linux, use sysvinit :

    $ sudo touch /var/lib/ceph/osd/{cluster_name}-{osd_number}/sysvinit
    $ sudo /etc/init.d/ceph start osd.{osd_number} [--cluster {cluster_name}]
    

    For example :

    $ sudo touch /var/lib/ceph/osd/ceph-0/sysvinit
    $ sudo /etc/init.d/ceph start osd.0
    

    Once you start your OSD, it is up and in.

Summary

Once you have your monitor and two OSDs up and running, you can watch the placement groups peer by executing the following command :

ceph -w

To view the tree, execute the following :

ceph osd tree

You should see output that looks something like this :

ID  WEIGHT    TYPE NAME        UP/DOWN  REWEIGHT  PRIMARY-AFFINITY 
-1       2    root default
-2       2        host node2
 0       1            osd.0         up         1                 1 
-3       1        host node3
 1       1            osd.1         up         1                 1

Please see the Ceph Administration Guide to This content is not included.add or This content is not included.remove monitors. Also see the Ceph Administration Guide to This content is not included.add or This content is not included.remove OSDs.

Category
Tags
Article Type