How do I configure Red Hat Gluster Storage (formerly Red Hat Storage Server) bricks for optimal volume functionality and performance?
Environment
- Red Hat Gluster Storage 3.0
- Red Hat Gluster Storage 3.1
- Red Hat Gluster Storage 3.2
Issue
- What are the recommended configurations for RAID, LVM, and filesystems for the backend bricks in a Red Hat Gluster Storage (formerly Red Hat Storage Server) volume?
- What are the recommended mount paramenters for bricks filesystems in a Red Hat Gluster Storage (formerly Red Hat Storage Server) volume?
Resolution
RAID configuration:
Because each RAID controller will have its own configuration mechanisms, it is out-of-scope to define the specific configuration steps. However, the following parameters are supported and recommended for Red Hat Storage.
The primary advantage of RAID 6 over RAID 10 is space efficiency. Since RAID 10 uses mirroring for data protection, available storage capacity with RAID 10 is only 50% of the disk space; with 12 disks, RAID 10 provides approximately 40% less storage capacity compared to RAID 6. RAID 6 also provides better performance for sequential writes to large files. The primary limitation of RAID 6 is its relatively poor performance for small writes, which makes it less suitable for workloads where writes to small files or random writes are predominant.
Logical Volume Management (LVM):
- When creating a Physical Volume, the data alignment should be set to correspond with the RAID stripe size. In this example, to align with an underlying RAID 6 stripe element size of 128k across 12 disks (10 data disks)3, the
--dataalignmentvalue will be a multiple of these:128k x 10 = 1280k
# pvcreate --dataalignment 1280k /dev/sdb
Writing physical volume data to disk "/dev/sdb"
Physical volume "/dev/sdb" successfully created
Note: It is recommended to create the LVM physical volume on the whole block device, rather than on a partition. Using partitions can result in loss of data alignment and increased I/O contention, leading to reduced performance.
- Confirm the data alignment with the
pvscommand.
# pvs -o +pe_start --units k | egrep 'PV|sdb'
PV VG Fmt Attr PSize PFree 1st PE
/dev/sdb lvm2 a-- 9762242560.00k 9762242560.00k 1280.00k
-
In order to ensure that logical volumes created in the volume group are aligned with the underlying hardware RAID, it is important to use the
-- physicalextentsize(or-s)option. LVM currently supports only physical extent sizes that are power of 2, whereas RAID full stripes are in general not a power of 2. Hence, getting proper alignment requires some extra work as outlined in this sub-section and in the sub-section on thin pool creation. -
Since a RAID full stripe may not be a power of 2, we will use the RAID stripe unit size, which should be a power of 2, as the physical extent size when creating the volume group.
Note: The larger extent size will result in a lower total number of LVM physical extents. With a very large LVM volume group, an excessive number of extents can lead to poor performance of the LVM tools. The LVM physical extent size represents the factor by which a logical volume can be grown or shrunk. This value will have no direct effect on filesystem efficiency or I/O performance.
- For a RAID 6 LUN with a stripe unit size of 128K, and 12 disks (10 data disks), use:
# vgcreate -s 128K datavg /dev/sdb
Volume group "datavg" successfully created
Using volume group(s) on command line
Finding volume group "datavg"
- Confirm the Volume Group configuration with
vgdisplay.
# vgdisplay -v datavg
-- Volume group ---
VG Name datavg
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 1
VG Access read/write
VG Status resizable
MAX LV 0
Cur PV 1
Act PV 1
VG Size 9.09 TiB
PE Size 64.00 MiB
Total PE 148959
Alloc PE / Size 0 / 0
Free PE / Size 148959 / 9.09 TiB
VG UUID Onvn18-zupy-RT96-83yF-Ya0A-V15j-wtuGwg
--- Physical volumes ---
PV Name /dev/sdb
PV UUID Ayrh0b-GRuw-28R5-bLv7-7g0r-ltSd-4z11CI
PV Status allocatable
Total PE / Free PE 148959 / 148959
- Create the logical volume.
# lvcreate -l 100%FREE -n rhsdata_lvol1 datavg
Logical volume "rhsdata_lvol1" created
- Confirm the logical volume with
lvs:
# lvs | egrep 'LV|rhsdata'
LV VG Attr LSize Pool Origin Data% Move Log Copy%
rhsdata_lvol1 datavg -wi-ao-- 9.09t
Filesystem:
-
When creating the XFS filesystems for bricks, a minimum inode size of 512 bytes is required to accommodate GlusterFS metadata.
-
The stripe unit (
su) and stripe width (sw) values should be set to match the underlying physical volume and RAID configurations. The stripe unit should be the same value as the RAID stripe element size, and the stripe width should be the the count of data disks in the RAID 6 set. -
In this example,
suwould be128kandswwould be10. The recommended default directory block size (aka naming area) is 8192 bytes.
# mkfs.xfs -i size=512 -n size=8192 -d su=128k,sw=10 /dev/mapper/datavg-rhsdata_lvol1
meta-data=/dev/mapper/datavg-rhsdata_lvol1 isize=512 agcount=16, agsize=1638368 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=26213888, imaxpct=25
= sunit=32 swidth=320 blks
naming =version 2 bsize=8192 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=12800, version=2
= sectsz=512 sunit=32 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Note that the
mkfs.xfsandxfs_infooutput can be confusing. The datasunit=32means 32 4k blocks, or 32 x 4k = 128k, and theswidthis a multiple of the sunit, so 32 x 10 = 320.
- When mounting the XFS brick filesystem, the
inode64andnoatimeoptions should be used.
# mkdir -p /rhs/storage1
# echo "/dev/mapper/datavg-rhsdata_lvol1 /rhs/storage1 xfs inode64,noatime 1 2" >> /etc/fstab
# mount /rhs/storage1
- Verify the filesystem and mount options
# mount | grep storage
/dev/mapper/datavg-adfile_lvol1 on /rhs/storage1 type xfs (rw,noatime,inode64)
# df -h | egrep ‘Filesystem|datavg’
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/datavg-rhsdata_lvol1 4.6T 46M 4.6T 1% /rhs/storage1
Additional Resources:
For RAID 6, the stripe unit size should be chosen so that the full stripe size (stripe unit * number of data disks) is between 1MiB and 2MiB, preferably in the low end of the range. Hardware RAID controllers usually allow stripe unit sizes that are a power of 2. For RAID 6 with 12 disks (10 data disks), this gives a recommended stripe unit size of 128KiB.
2: 1 RAID LUN per brick is needed to ensure the data alignment that is part of this recommended configuraiton. If multiple LUNs act as PVs in the same VG, the alignment cannot be guaranteed.
3: RAID 6 double parity with 12 physical disks means that each stripe will have 2 parity segments, and thus 10 data segments, representing the stripe width of 10.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.