Azure Disk performance by region
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.6, 4.7
- Azure Platform
Issue
- Azure Disk performance is known to vary by region.
- Azure Disk performance issues manifest themselves as etcd performance issues such as leader election changes, high fsync latency durations, and API server failures.
Resolution
Choosing a region with better performance will result in better cluster stability. Refer to etcd backend performance requirements for OpenShift for additional information about the etcd requiremets.
The Azure platform is currently working on reducing storage latency by rolling out updates that improve storage performance.
Microsoft region storage upgrade status:
| Completed | In Progress |
|---|---|
| US West, US West Central, France Central, Canada Central, Brazil South, Asia Southeast etc, both Norway regions | North Europe ~30% complete |
Root Cause
Azure disk performance has high latency. Etcd requires performance of its 99th percentile fsync latency to be sub 10 milliseconds.
Here is a breakdown of fsync times using a simple test with the fio command line tool:
| Region | Fsync 99 | Fsync 99.99 | Fsync Avg | Fsync Max |
|---|---|---|---|---|
| uaenorth | 5.59 | 20.88 | 2.46 | 37.71 |
| eastasia | 6.42 | 18.58 | 2.60 | 44.49 |
| southeastasia | 6.75 | 16.75 | 2.48 | 24.32 |
| canadacentral | 6.78 | 26.75 | 3.09 | 36.58 |
| norwayeast | 7.00 | 22.18 | 2.91 | 29.20 |
| koreasouth | 7.07 | 21.99 | 3.24 | 31.38 |
| southindia | 7.25 | 20.07 | 3.10 | 31.27 |
| japaneast | 7.32 | 22.65 | 3.02 | 30.29 |
| ukwest | 7.41 | 21.45 | 3.03 | 29.58 |
| switzerlandnorth | 7.50 | 19.77 | 2.60 | 24.81 |
| northcentralus | 7.55 | 20.94 | 3.07 | 28.68 |
| westindia | 7.56 | 18.40 | 2.94 | 24.34 |
| japanwest | 7.63 | 20.86 | 3.27 | 40.93 |
| centralindia | 7.91 | 19.42 | 3.12 | 26.21 |
| australiacentral | 7.95 | 19.62 | 2.83 | 22.99 |
| germanywestcentral | 8.01 | 25.64 | 4.16 | 32.85 |
| southcentralus | 8.15 | 20.76 | 3.19 | 27.72 |
| westcentralus | 8.40 | 17.27 | 2.32 | 27.95 |
| westus2 | 8.48 | 20.15 | 3.06 | 28.23 |
| westeurope | 8.78 | 22.65 | 3.21 | 28.30 |
| southafricanorth | 8.92 | 23.45 | 4.33 | 32.06 |
| brazilsouth | 9.21 | 31.23 | 3.26 | 42.38 |
| koreacentral | 9.32 | 24.96 | 4.25 | 54.67 |
| canadaeast | 9.34 | 23.70 | 3.29 | 49.34 |
| uksouth | 9.55 | 22.08 | 3.04 | 26.22 |
| northeurope | 9.80 | 25.31 | 3.46 | 38.60 |
| australiaeast | 9.83 | 28.70 | 4.07 | 56.37 |
| eastus | 10.49 | 25.46 | 3.48 | 37.32 |
| francecentral | 10.62 | 24.59 | 3.32 | 34.76 |
| westus | 11.45 | 27.22 | 4.19 | 34.92 |
| centralus | 15.18 | 59.21 | 5.14 | 112.97 |
Important: The
fiotest is a short test executed at specific moment. It can show if the disk is fast enough to support the etcd requirements, but other loads in the disk could cause that etcd don't behaves correctly. Review also the etcd metrics to know the real etcd behavior as shown in How to graph etcd metrics using Prometheus to gauge Etcd performance in OpenShift.
Diagnostic Steps
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
- Spin up an OpenShift cluster with default configuration:
Standard_D8s_v3 (8 vcpus, 32 GiB memory)
Data disk: 1 TB Premium SSD (P30) 5000 IOPS / 200 MB/s
Caching set to `ReadOnly`.
- Perform the following command on the cluster:
oc debug node/$NODE --as-root=true --image ljishen/fio -- sh -c '/bin/rm -f /host/var/lib/etcd/etcd && /usr/local/bin/fio --rw=write --fdatasync=1 --size=22m --bs=2300 --name=etcd1 --ioengine=sync --directory=/host/var/lib/etcd/ --filename=etcd'
- Look at the results and find the section labeled
fsync/fdatasync/sync_file_range. This section includes a text version of a histogram. The values represent the percentage of completed synchronizations or flushes of file data to the storage device (man fsync for more info). The values found in this section are inmicroseconds(usec) and can be converted tomillisecondsby multiplying by1000.
In the example below, that section gives a 99.00th percentile of requests completed in 10.945 ms. This performance would be considered higher than acceptable according to the Content from github.com is not included.etcd documentation. Refer to Content from www.ibm.com is not included.Using Fio to Tell Whether Your Storage is Fast Enough for Etcd and etcd backend performance requirements for OpenShift for additional information.
This is example output from the westus region in Azure:
fio-3.6
Starting 1 process
etcd1: Laying out IO file (1 file / 22MiB)
etcd1: (groupid=0, jobs=1): err= 0: pid=48993: Mon Mar 22 16:21:00 2021
write: IOPS=243, BW=546KiB/s (559kB/s)(21.0MiB/41246msec)
clat (usec): min=5, max=943, avg=14.31, stdev=11.92
lat (usec): min=6, max=944, avg=15.51, stdev=12.05
clat percentiles (usec):
| 1.00th=[ 8], 5.00th=[ 10], 10.00th=[ 11], 20.00th=[ 11],
| 30.00th=[ 12], 40.00th=[ 13], 50.00th=[ 13], 60.00th=[ 14],
| 70.00th=[ 16], 80.00th=[ 17], 90.00th=[ 19], 95.00th=[ 23],
| 99.00th=[ 39], 99.50th=[ 46], 99.90th=[ 86], 99.95th=[ 108],
| 99.99th=[ 445]
bw ( KiB/s): min= 422, max= 615, per=99.90%, avg=545.45, stdev=39.37, samples=82
iops : min= 188, max= 274, avg=243.01, stdev=17.55, samples=82
lat (usec) : 10=9.84%, 20=83.23%, 50=6.59%, 100=0.27%, 250=0.05%
lat (usec) : 500=0.01%, 1000=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=1240, max=23698, avg=4088.49, stdev=2321.18
sync percentiles (usec):
| 1.00th=[ 1385], 5.00th=[ 1500], 10.00th=[ 1565], 20.00th=[ 1680],
| 30.00th=[ 1860], 40.00th=[ 2999], 50.00th=[ 4555], 60.00th=[ 4883],
| 70.00th=[ 5145], 80.00th=[ 5669], 90.00th=[ 6783], 95.00th=[ 7963],
| 99.00th=[10945], 99.50th=[12387], 99.90th=[15533], 99.95th=[19006],
| 99.99th=[22414]
cpu : usr=0.51%, sys=2.07%, ctx=32364, majf=0, minf=12
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=546KiB/s (559kB/s), 546KiB/s-546KiB/s (559kB/s-559kB/s), io=21.0MiB (23.1MB), run=41246-41246msec
Disk stats (read/write):
dm-0: ios=0/22754, merge=0/0, ticks=0/61604, in_queue=61604, util=55.19%, aggrios=0/22693, aggrmerge=0/118, aggrticks=0/60456, aggrin_queue=47888, aggrutil=55.13%
sda: ios=0/22693, merge=0/118, ticks=0/60456, in_queue=47888, util=55.13%
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.