Azure Disk performance by region

Solution Unverified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.6, 4.7
  • Azure Platform

Issue

  • Azure Disk performance is known to vary by region.
  • Azure Disk performance issues manifest themselves as etcd performance issues such as leader election changes, high fsync latency durations, and API server failures.

Resolution

Choosing a region with better performance will result in better cluster stability. Refer to etcd backend performance requirements for OpenShift for additional information about the etcd requiremets.

The Azure platform is currently working on reducing storage latency by rolling out updates that improve storage performance.

Microsoft region storage upgrade status:

CompletedIn Progress
US West, US West Central, France Central, Canada Central, Brazil South, Asia Southeast etc, both Norway regionsNorth Europe ~30% complete

Root Cause

Azure disk performance has high latency. Etcd requires performance of its 99th percentile fsync latency to be sub 10 milliseconds.

Here is a breakdown of fsync times using a simple test with the fio command line tool:

RegionFsync 99Fsync 99.99Fsync AvgFsync Max
uaenorth5.5920.882.4637.71
eastasia6.4218.582.6044.49
southeastasia6.7516.752.4824.32
canadacentral6.7826.753.0936.58
norwayeast7.0022.182.9129.20
koreasouth7.0721.993.2431.38
southindia7.2520.073.1031.27
japaneast7.3222.653.0230.29
ukwest7.4121.453.0329.58
switzerlandnorth7.5019.772.6024.81
northcentralus7.5520.943.0728.68
westindia7.5618.402.9424.34
japanwest7.6320.863.2740.93
centralindia7.9119.423.1226.21
australiacentral7.9519.622.8322.99
germanywestcentral8.0125.644.1632.85
southcentralus8.1520.763.1927.72
westcentralus8.4017.272.3227.95
westus28.4820.153.0628.23
westeurope8.7822.653.2128.30
southafricanorth8.9223.454.3332.06
brazilsouth9.2131.233.2642.38
koreacentral9.3224.964.2554.67
canadaeast9.3423.703.2949.34
uksouth9.5522.083.0426.22
northeurope9.8025.313.4638.60
australiaeast9.8328.704.0756.37
eastus10.4925.463.4837.32
francecentral10.6224.593.3234.76
westus11.4527.224.1934.92
centralus15.1859.215.14112.97

Important: The fio test is a short test executed at specific moment. It can show if the disk is fast enough to support the etcd requirements, but other loads in the disk could cause that etcd don't behaves correctly. Review also the etcd metrics to know the real etcd behavior as shown in How to graph etcd metrics using Prometheus to gauge Etcd performance in OpenShift.

Diagnostic Steps

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

  1. Spin up an OpenShift cluster with default configuration:
Standard_D8s_v3 (8 vcpus, 32 GiB memory)
Data disk: 1 TB Premium SSD (P30) 5000 IOPS / 200 MB/s
Caching set to `ReadOnly`.
  1. Perform the following command on the cluster:
oc debug node/$NODE --as-root=true --image ljishen/fio -- sh -c '/bin/rm -f /host/var/lib/etcd/etcd && /usr/local/bin/fio --rw=write --fdatasync=1 --size=22m --bs=2300 --name=etcd1 --ioengine=sync --directory=/host/var/lib/etcd/ --filename=etcd'
  1. Look at the results and find the section labeled fsync/fdatasync/sync_file_range. This section includes a text version of a histogram. The values represent the percentage of completed synchronizations or flushes of file data to the storage device (man fsync for more info). The values found in this section are in microseconds (usec) and can be converted to milliseconds by multiplying by 1000.

In the example below, that section gives a 99.00th percentile of requests completed in 10.945 ms. This performance would be considered higher than acceptable according to the Content from github.com is not included.etcd documentation. Refer to Content from www.ibm.com is not included.Using Fio to Tell Whether Your Storage is Fast Enough for Etcd and etcd backend performance requirements for OpenShift for additional information.

This is example output from the westus region in Azure:

fio-3.6
Starting 1 process
etcd1: Laying out IO file (1 file / 22MiB)

etcd1: (groupid=0, jobs=1): err= 0: pid=48993: Mon Mar 22 16:21:00 2021
  write: IOPS=243, BW=546KiB/s (559kB/s)(21.0MiB/41246msec)
    clat (usec): min=5, max=943, avg=14.31, stdev=11.92
     lat (usec): min=6, max=944, avg=15.51, stdev=12.05
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[   10], 10.00th=[   11], 20.00th=[   11],
     | 30.00th=[   12], 40.00th=[   13], 50.00th=[   13], 60.00th=[   14],
     | 70.00th=[   16], 80.00th=[   17], 90.00th=[   19], 95.00th=[   23],
     | 99.00th=[   39], 99.50th=[   46], 99.90th=[   86], 99.95th=[  108],
     | 99.99th=[  445]
   bw (  KiB/s): min=  422, max=  615, per=99.90%, avg=545.45, stdev=39.37, samples=82
   iops        : min=  188, max=  274, avg=243.01, stdev=17.55, samples=82
  lat (usec)   : 10=9.84%, 20=83.23%, 50=6.59%, 100=0.27%, 250=0.05%
  lat (usec)   : 500=0.01%, 1000=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=1240, max=23698, avg=4088.49, stdev=2321.18
    sync percentiles (usec):
     |  1.00th=[ 1385],  5.00th=[ 1500], 10.00th=[ 1565], 20.00th=[ 1680],
     | 30.00th=[ 1860], 40.00th=[ 2999], 50.00th=[ 4555], 60.00th=[ 4883],
     | 70.00th=[ 5145], 80.00th=[ 5669], 90.00th=[ 6783], 95.00th=[ 7963],
     | 99.00th=[10945], 99.50th=[12387], 99.90th=[15533], 99.95th=[19006],
     | 99.99th=[22414]
  cpu          : usr=0.51%, sys=2.07%, ctx=32364, majf=0, minf=12
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=546KiB/s (559kB/s), 546KiB/s-546KiB/s (559kB/s-559kB/s), io=21.0MiB (23.1MB), run=41246-41246msec

Disk stats (read/write):
    dm-0: ios=0/22754, merge=0/0, ticks=0/61604, in_queue=61604, util=55.19%, aggrios=0/22693, aggrmerge=0/118, aggrticks=0/60456, aggrin_queue=47888, aggrutil=55.13%
  sda: ios=0/22693, merge=0/118, ticks=0/60456, in_queue=47888, util=55.13%
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.