How do I configure Red Hat Gluster Storage (formerly Red Hat Storage Server) bricks for optimal volume functionality and performance?

Solution Verified - Updated

Environment

  • Red Hat Gluster Storage 3.0
  • Red Hat Gluster Storage 3.1
  • Red Hat Gluster Storage 3.2

Issue

  • What are the recommended configurations for RAID, LVM, and filesystems for the backend bricks in a Red Hat Gluster Storage (formerly Red Hat Storage Server) volume?
  • What are the recommended mount paramenters for bricks filesystems in a Red Hat Gluster Storage (formerly Red Hat Storage Server) volume?

Resolution

RAID configuration:

Because each RAID controller will have its own configuration mechanisms, it is out-of-scope to define the specific configuration steps. However, the following parameters are supported and recommended for Red Hat Storage.

  • RAID 6 or RAID 10 (1+0) with 12 physical disks per LUN
  • Battery-backed write cache enabled for the RAID arrays
  • For RAID 6, a stripe element size of128KiB is recommended1
  • For RAID 10, a stripe element size of 256KiB is recommended
  • 1 RAID LUN per brick2

The primary advantage of RAID 6 over RAID 10 is space efficiency. Since RAID 10 uses mirroring for data protection, available storage capacity with RAID 10 is only 50% of the disk space; with 12 disks, RAID 10 provides approximately 40% less storage capacity compared to RAID 6. RAID 6 also provides better performance for sequential writes to large files. The primary limitation of RAID 6 is its relatively poor performance for small writes, which makes it less suitable for workloads where writes to small files or random writes are predominant.

Logical Volume Management (LVM):

  • When creating a Physical Volume, the data alignment should be set to correspond with the RAID stripe size. In this example, to align with an underlying RAID 6 stripe element size of 128k across 12 disks (10 data disks)3, the --dataalignment value will be a multiple of these: 128k x 10 = 1280k
# pvcreate --dataalignment 1280k /dev/sdb
  Writing physical volume data to disk "/dev/sdb"
  Physical volume "/dev/sdb" successfully created

Note: It is recommended to create the LVM physical volume on the whole block device, rather than on a partition. Using partitions can result in loss of data alignment and increased I/O contention, leading to reduced performance.

  • Confirm the data alignment with the pvs command.
# pvs -o +pe_start --units k | egrep 'PV|sdb'
  PV         VG           Fmt  Attr PSize          PFree          1st PE
  /dev/sdb                lvm2 a--  9762242560.00k 9762242560.00k 1280.00k
  • In order to ensure that logical volumes created in the volume group are aligned with the underlying hardware RAID, it is important to use the -- physicalextentsize (or -s)option. LVM currently supports only physical extent sizes that are power of 2, whereas RAID full stripes are in general not a power of 2. Hence, getting proper alignment requires some extra work as outlined in this sub-section and in the sub-section on thin pool creation.

  • Since a RAID full stripe may not be a power of 2, we will use the RAID stripe unit size, which should be a power of 2, as the physical extent size when creating the volume group.

Note: The larger extent size will result in a lower total number of LVM physical extents. With a very large LVM volume group, an excessive number of extents can lead to poor performance of the LVM tools. The LVM physical extent size represents the factor by which a logical volume can be grown or shrunk. This value will have no direct effect on filesystem efficiency or I/O performance.

  • For a RAID 6 LUN with a stripe unit size of 128K, and 12 disks (10 data disks), use:
# vgcreate -s 128K datavg /dev/sdb
  Volume group "datavg" successfully created
  Using volume group(s) on command line
  Finding volume group "datavg"
  • Confirm the Volume Group configuration with vgdisplay.
# vgdisplay -v datavg
  ­-- Volume group ---
  VG Name               datavg
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur PV                1
  Act PV                1
  VG Size               9.09 TiB
  PE Size               64.00 MiB
  Total PE              148959
  Alloc PE / Size       0 / 0
  Free PE / Size        148959 / 9.09 TiB
  VG UUID               Onvn18-zupy-RT96-83yF-Ya0A-V15j-wtuGwg

  --- Physical volumes ---
  PV Name               /dev/sdb
  PV UUID               Ayrh0b-GRuw-28R5-bLv7-7g0r-ltSd-4z11CI
  PV Status             allocatable
  Total PE / Free PE    148959 / 148959
  • Create the logical volume.
# lvcreate -l 100%FREE -n rhsdata_lvol1 datavg
  Logical volume "rhsdata_lvol1" created
  • Confirm the logical volume with lvs :
# lvs | egrep 'LV|rhsdata'
  LV VG Attr LSize Pool Origin Data% Move Log Copy%
  rhsdata_lvol1 datavg -wi-ao-- 9.09t 

Filesystem:

  • When creating the XFS filesystems for bricks, a minimum inode size of 512 bytes is required to accommodate GlusterFS metadata.

  • The stripe unit (su) and stripe width (sw) values should be set to match the underlying physical volume and RAID configurations. The stripe unit should be the same value as the RAID stripe element size, and the stripe width should be the the count of data disks in the RAID 6 set.

  • In this example, su would be 128k and sw would be 10. The recommended default directory block size (aka naming area) is 8192 bytes.

# mkfs.xfs -i size=512 -n size=8192 -d su=128k,sw=10 /dev/mapper/datavg-rhsdata_lvol1
meta-data=/dev/mapper/datavg-rhsdata_lvol1  isize=512    agcount=16, agsize=1638368 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=26213888, imaxpct=25
         =                       sunit=32     swidth=320 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=512   sunit=32 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Note that the mkfs.xfs and xfs_info output can be confusing. The data sunit=32 means 32 4k blocks, or 32 x 4k = 128k, and the swidth is a multiple of the sunit, so 32 x 10 = 320.

  • When mounting the XFS brick filesystem, the inode64 and noatime options should be used.
# mkdir -p /rhs/storage1
# echo "/dev/mapper/datavg-rhsdata_lvol1  /rhs/storage1 xfs inode64,noatime 1 2" >> /etc/fstab
# mount /rhs/storage1
  • Verify the filesystem and mount options
# mount | grep storage
/dev/mapper/datavg-adfile_lvol1 on /rhs/storage1 type xfs (rw,noatime,inode64)

# df -h | egrep ‘Filesystem|datavg’
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/datavg-rhsdata_lvol1 4.6T   46M  4.6T   1% /rhs/storage1

Additional Resources:

1

For RAID 6, the stripe unit size should be chosen so that the full stripe size (stripe unit * number of data disks) is between 1MiB and 2MiB, preferably in the low end of the range. Hardware RAID controllers usually allow stripe unit sizes that are a power of 2. For RAID 6 with 12 disks (10 data disks), this gives a recommended stripe unit size of 128KiB.
2: 1 RAID LUN per brick is needed to ensure the data alignment that is part of this recommended configuraiton. If multiple LUNs act as PVs in the same VG, the alignment cannot be guaranteed.
3: RAID 6 double parity with 12 physical disks means that each stripe will have 2 parity segments, and thus 10 data segments, representing the stripe width of 10.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.