We have two 24 x NVME storage backends and multiple frontends. I am testing different ways to bring storage to a frontend server and might as well display the results here more publicly.
I used fio and hdparm and I have tested the following configurations so far:
I am using an RDMA Infiniband 56/40Gbps network with ConnectX-3 cards. Each server has a single active connection. I could possibly speed things up by using a different connection per storage server but that may not be required.
For testing I am using two commands:
Supermicro MegaRAID based 2 SSD disk raid as an operating system disk.
hdparm -Tt /dev/sda: Timing cached reads: 21894 MB in 1.99 seconds = 11005.29 MB/sec Timing buffered disk reads: 3186 MB in 3.00 seconds = 1061.92 MB/sec random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=3542: Mon Jun 28 18:39:00 2021 write: IOPS=46.5k, BW=182MiB/s (191MB/s)(12.0GiB/67589msec); 0 zone resets slat (nsec): min=879, max=127715, avg=1779.53, stdev=650.57 clat (nsec): min=202, max=1674.4k, avg=8290.97, stdev=2533.87 lat (usec): min=7, max=1676, avg=10.07, stdev= 2.73 clat percentiles (nsec): | 1.00th=[ 6688], 5.00th=[ 6944], 10.00th=[ 7136], 20.00th=[ 7328], | 30.00th=[ 7456], 40.00th=[ 7520], 50.00th=[ 7648], 60.00th=[ 7712], | 70.00th=[ 7904], 80.00th=[ 8512], 90.00th=[10816], 95.00th=[11456], | 99.00th=[16192], 99.50th=[24192], 99.90th=[28032], 99.95th=[36096], | 99.99th=[50432] bw ( KiB/s): min=78840, max=418144, per=100.00%, avg=363084.28, stdev=77272.87, samples=69 iops : min=19710, max=104536, avg=90771.12, stdev=19318.21, samples=69 lat (nsec) : 250=0.01% lat (usec) : 10=85.06%, 20=13.99%, 50=0.94%, 100=0.01%, 250=0.01% lat (usec) : 750=0.01%, 1000=0.01% lat (msec) : 2=0.01% cpu : usr=15.36%, sys=14.12%, ctx=3195292, majf=0, minf=701 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,3145729,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=182MiB/s (191MB/s), 182MiB/s-182MiB/s (191MB/s-191MB/s), io=12.0GiB (12.9GB), run=67589-67589msec Disk stats (read/write): dm-0: ios=0/127661, merge=0/0, ticks=0/1201685, in_queue=1201685, util=50.69%, aggrios=0/161924, aggrmerge=0/12, aggrticks=0/2302922, aggrin_queue=2302922, aggrutil=56.08% sda: ios=0/161924, merge=0/12, ticks=0/2302922, in_queue=2302922, util=56.08%
Intel Corporation SSDPE2KE016T8O PCIe NVME local device.
hdparm -Tt /dev/nvme2n1: Timing cached reads: 17856 MB in 2.00 seconds = 8943.15 MB/sec Timing buffered disk reads: 6910 MB in 3.00 seconds = 2303.19 MB/sec random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [w(1)][100.0%][w=349MiB/s][w=89.2k IOPS][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=3782: Mon Jun 28 18:45:44 2021 write: IOPS=88.6k, BW=346MiB/s (363MB/s)(20.3GiB/60194msec); 0 zone resets slat (nsec): min=580, max=81950, avg=1524.49, stdev=380.18 clat (nsec): min=220, max=600600, avg=7236.48, stdev=1917.52 lat (usec): min=4, max=602, avg= 8.76, stdev= 2.18 clat percentiles (nsec): | 1.00th=[ 4320], 5.00th=[ 4576], 10.00th=[ 4768], 20.00th=[ 5344], | 30.00th=[ 6368], 40.00th=[ 7520], 50.00th=[ 7776], 60.00th=[ 7904], | 70.00th=[ 8096], 80.00th=[ 8256], 90.00th=[ 8768], 95.00th=[ 9152], | 99.00th=[11072], 99.50th=[12224], 99.90th=[16768], 99.95th=[19072], | 99.99th=[34560] bw ( KiB/s): min=20728, max=622960, per=100.00%, avg=411116.83, stdev=106584.98, samples=103 iops : min= 5182, max=155740, avg=102779.18, stdev=26646.25, samples=103 lat (nsec) : 250=0.01%, 750=0.01% lat (usec) : 4=0.01%, 10=97.95%, 20=2.00%, 50=0.04%, 100=0.01% lat (usec) : 250=0.01%, 500=0.01%, 750=0.01% cpu : usr=18.80%, sys=25.26%, ctx=5379783, majf=0, minf=780 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,5333109,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=346MiB/s (363MB/s), 346MiB/s-346MiB/s (363MB/s-363MB/s), io=20.3GiB (21.8GB), run=60194-60194msec Disk stats (read/write): nvme0n1: ios=1/1405100, merge=0/23, ticks=0/2932174, in_queue=2932174, util=24.13%
The previous Intel NVME exported with NVME-of, as seen in one front end server.
hdparm -Tt /dev/nvme0n3: Timing cached reads: 21334 MB in 1.99 seconds = 10722.82 MB/sec Timing buffered disk reads: 5940 MB in 3.00 seconds = 1979.68 MB/sec random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [F(1)][100.0%][w=4839KiB/s][w=1209 IOPS][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=3456: Mon Jun 28 18:37:08 2021 write: IOPS=77.8k, BW=304MiB/s (319MB/s)(18.2GiB/61171msec); 0 zone resets slat (nsec): min=860, max=110739, avg=1750.52, stdev=316.16 clat (usec): min=4, max=576, avg= 8.12, stdev= 2.13 lat (usec): min=7, max=578, avg= 9.87, stdev= 2.26 clat percentiles (nsec): | 1.00th=[ 6624], 5.00th=[ 6944], 10.00th=[ 7072], 20.00th=[ 7200], | 30.00th=[ 7328], 40.00th=[ 7392], 50.00th=[ 7456], 60.00th=[ 7584], | 70.00th=[ 7712], 80.00th=[ 8384], 90.00th=[10688], 95.00th=[11328], | 99.00th=[13632], 99.50th=[17280], 99.90th=[29824], 99.95th=[44288], | 99.99th=[61696] bw ( KiB/s): min=15976, max=430968, per=100.00%, avg=370133.55, stdev=90692.00, samples=102 iops : min= 3994, max=107742, avg=92533.38, stdev=22673.00, samples=102 lat (usec) : 10=85.87%, 20=13.78%, 50=0.32%, 100=0.02%, 250=0.01% lat (usec) : 500=0.01%, 750=0.01% cpu : usr=23.38%, sys=20.65%, ctx=4824959, majf=0, minf=526 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,4762087,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=304MiB/s (319MB/s), 304MiB/s-304MiB/s (319MB/s-319MB/s), io=18.2GiB (19.5GB), run=61171-61171msec Disk stats (read/write): nvme0n1: ios=1/738871, merge=0/28, ticks=0/279727, in_queue=279727, util=25.48%
hdparm -Tt /dev/datavault/testvolume: Timing cached reads: 21458 MB in 1.99 seconds = 10784.51 MB/sec Timing buffered disk reads: 7640 MB in 3.00 seconds = 2546.47 MB/sec random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [F(1)][100.0%][w=2029KiB/s][w=507 IOPS][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=4526: Tue Jun 29 08:05:20 2021 write: IOPS=76.2k, BW=298MiB/s (312MB/s)(17.8GiB/61302msec); 0 zone resets slat (nsec): min=802, max=142478, avg=1831.73, stdev=521.29 clat (usec): min=5, max=3873, avg= 8.61, stdev= 6.04 lat (usec): min=7, max=3874, avg=10.44, stdev= 6.13 clat percentiles (nsec): | 1.00th=[ 6624], 5.00th=[ 6880], 10.00th=[ 7072], 20.00th=[ 7264], | 30.00th=[ 7392], 40.00th=[ 7520], 50.00th=[ 7648], 60.00th=[ 7840], | 70.00th=[ 8768], 80.00th=[10432], 90.00th=[11072], 95.00th=[11584], | 99.00th=[14016], 99.50th=[22656], 99.90th=[29312], 99.95th=[43264], | 99.99th=[52480] bw ( KiB/s): min= 838, max=438626, per=100.00%, avg=349216.78, stdev=92392.10, samples=106 iops : min= 209, max=109656, avg=87303.93, stdev=23097.92, samples=106 lat (usec) : 10=74.54%, 20=24.87%, 50=0.57%, 100=0.01%, 250=0.01% lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01% cpu : usr=23.60%, sys=22.23%, ctx=4868012, majf=0, minf=979 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,4674124,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=298MiB/s (312MB/s), 298MiB/s-298MiB/s (312MB/s-312MB/s), io=17.8GiB (19.1GB), run=61302-61302msec Disk stats (read/write): dm-3: ios=1/891284, merge=0/0, ticks=0/694354549, in_queue=694354549, util=25.73%, aggrios=1/904069, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md0: ios=1/904069, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/903755, aggrmerge=0/188, aggrticks=0/333576, aggrin_queue=333577, aggrutil=23.45% nvme0n1: ios=1/903756, merge=0/188, ticks=1/338804, in_queue=338805, util=23.45% nvme1n1: ios=0/903755, merge=0/189, ticks=0/328349, in_queue=328349, util=23.27%
I set up iSER target to the storage backend, iSER initiator to the frontend and created an XFS filesystem. Exactly as with NVME-of cases. Looks slightly less in writing speed but very solid latency.
hdparm -Tt /dev/disk/by-path/ip-xxx-lun-1: Timing cached reads: 21130 MB in 1.99 seconds = 10618.36 MB/sec Timing buffered disk reads: 3704 MB in 3.00 seconds = 1234.42 MB/sec random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=3116: Tue Jun 29 15:52:07 2021 write: IOPS=70.8k, BW=277MiB/s (290MB/s)(17.3GiB/63904msec); 0 zone resets slat (nsec): min=836, max=324487, avg=1694.97, stdev=519.52 clat (nsec): min=267, max=938988, avg=7613.02, stdev=1547.30 lat (usec): min=6, max=940, avg= 9.31, stdev= 1.67 clat percentiles (nsec): | 1.00th=[ 6688], 5.00th=[ 6880], 10.00th=[ 7008], 20.00th=[ 7136], | 30.00th=[ 7264], 40.00th=[ 7328], 50.00th=[ 7392], 60.00th=[ 7520], | 70.00th=[ 7584], 80.00th=[ 7776], 90.00th=[ 8256], 95.00th=[ 8512], | 99.00th=[10432], 99.50th=[22400], 99.90th=[25728], 99.95th=[26752], | 99.99th=[29568] bw ( KiB/s): min=33408, max=423456, per=100.00%, avg=394176.32, stdev=72408.07, samples=91 iops : min= 8352, max=105864, avg=98544.10, stdev=18102.02, samples=91 lat (nsec) : 500=0.01% lat (usec) : 4=0.01%, 10=98.87%, 20=0.55%, 50=0.58%, 100=0.01% lat (usec) : 250=0.01%, 750=0.01%, 1000=0.01% cpu : usr=21.22%, sys=20.55%, ctx=4836908, majf=0, minf=893 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,4526608,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=277MiB/s (290MB/s), 277MiB/s-277MiB/s (290MB/s-290MB/s), io=17.3GiB (18.5GB), run=63904-63904msec Disk stats (read/write): sdd: ios=1/620934, merge=0/5878, ticks=1/224797, in_queue=224798, util=34.91%
Similar setup to NVME-of but using iSER instead.
hdparm -Tt /dev/datavault/testvolume: Timing cached reads: 21368 MB in 1.99 seconds = 10739.75 MB/sec Timing buffered disk reads: 3898 MB in 3.00 seconds = 1298.67 MB/sec random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [w(1)][100.0%][w=74.6MiB/s][w=19.1k IOPS][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=10059: Mon Jul 5 12:01:40 2021 write: IOPS=69.7k, BW=272MiB/s (285MB/s)(16.1GiB/60525msec); 0 zone resets slat (nsec): min=839, max=261971, avg=1676.40, stdev=620.25 clat (nsec): min=205, max=1344.2k, avg=8360.25, stdev=8926.87 lat (usec): min=7, max=1345, avg=10.04, stdev= 8.97 clat percentiles (usec): | 1.00th=[ 7], 5.00th=[ 7], 10.00th=[ 8], 20.00th=[ 8], | 30.00th=[ 8], 40.00th=[ 8], 50.00th=[ 8], 60.00th=[ 8], | 70.00th=[ 8], 80.00th=[ 8], 90.00th=[ 9], 95.00th=[ 10], | 99.00th=[ 25], 99.50th=[ 35], 99.90th=[ 155], 99.95th=[ 186], | 99.99th=[ 273] bw ( KiB/s): min=21752, max=424928, per=100.00%, avg=367624.85, stdev=96931.23, samples=91 iops : min= 5440, max=106232, avg=91906.21, stdev=24232.72, samples=91 lat (nsec) : 250=0.01%, 500=0.01% lat (usec) : 4=0.01%, 10=95.68%, 20=2.66%, 50=1.27%, 100=0.14% lat (usec) : 250=0.23%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01% cpu : usr=20.85%, sys=21.57%, ctx=4476176, majf=0, minf=871 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,4215656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s), io=16.1GiB (17.3GB), run=60525-60525msec Disk stats (read/write): dm-3: ios=1/548969, merge=0/0, ticks=1/830242240, in_queue=830242241, util=36.64%, aggrios=1/623957, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md0: ios=1/623957, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/623048, aggrmerge=0/526, aggrticks=0/213754, aggrin_queue=213754, aggrutil=35.33% sdb: ios=1/623078, merge=0/497, ticks=1/211402, in_queue=211402, util=35.33% sdc: ios=0/623019, merge=0/556, ticks=0/216106, in_queue=216106, util=35.32%
hdparm -Tt /dev/sdb: Timing cached reads: 21054 MB in 1.99 seconds = 10581.09 MB/sec Timing buffered disk reads: 3908 MB in 3.00 seconds = 1302.55 MB/sec fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1 random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1 fio-3.19 Starting 1 process random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=11449: Tue Jul 6 14:14:06 2021 write: IOPS=62.5k, BW=244MiB/s (256MB/s)(15.7GiB/65801msec); 0 zone resets slat (nsec): min=913, max=134212, avg=1668.50, stdev=498.55 clat (usec): min=3, max=1588, avg= 7.58, stdev= 2.04 lat (usec): min=7, max=1590, avg= 9.25, stdev= 2.13 clat percentiles (nsec): | 1.00th=[ 6560], 5.00th=[ 6752], 10.00th=[ 6880], 20.00th=[ 7072], | 30.00th=[ 7200], 40.00th=[ 7328], 50.00th=[ 7456], 60.00th=[ 7520], | 70.00th=[ 7648], 80.00th=[ 7776], 90.00th=[ 8032], 95.00th=[ 8512], | 99.00th=[10176], 99.50th=[22656], 99.90th=[25984], 99.95th=[27008], | 99.99th=[36096] bw ( KiB/s): min=67536, max=451744, per=100.00%, avg=396963.38, stdev=70142.99, samples=82 iops : min=16884, max=112936, avg=99240.79, stdev=17535.73, samples=82 lat (usec) : 4=0.01%, 10=98.97%, 20=0.42%, 50=0.61%, 100=0.01% lat (usec) : 250=0.01%, 500=0.01% lat (msec) : 2=0.01% cpu : usr=20.62%, sys=16.00%, ctx=4239733, majf=0, minf=807 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,4111551,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=244MiB/s (256MB/s), 244MiB/s-244MiB/s (256MB/s-256MB/s), io=15.7GiB (16.8GB), run=65801-65801msec Disk stats (read/write): sdb: ios=1/189398, merge=0/3781, ticks=1/229437, in_queue=229437, util=38.13%
hdparm -Tt /dev/datavault/test: Timing cached reads: 20898 MB in 1.99 seconds = 10501.52 MB/sec Timing buffered disk reads: 3812 MB in 3.00 seconds = 1270.48 MB/sec fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1 random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1 fio-3.19 Starting 1 process random-write: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s] random-write: (groupid=0, jobs=1): err= 0: pid=11743: Tue Jul 6 14:25:35 2021 write: IOPS=61.8k, BW=241MiB/s (253MB/s)(15.6GiB/66078msec); 0 zone resets slat (nsec): min=815, max=152844, avg=1670.57, stdev=552.98 clat (usec): min=5, max=164, avg= 7.63, stdev= 1.64 lat (usec): min=7, max=170, avg= 9.30, stdev= 1.76 clat percentiles (nsec): | 1.00th=[ 6752], 5.00th=[ 6944], 10.00th=[ 7072], 20.00th=[ 7200], | 30.00th=[ 7264], 40.00th=[ 7392], 50.00th=[ 7456], 60.00th=[ 7520], | 70.00th=[ 7584], 80.00th=[ 7712], 90.00th=[ 8032], 95.00th=[ 8512], | 99.00th=[10944], 99.50th=[23168], 99.90th=[26240], 99.95th=[27264], | 99.99th=[40704] bw ( KiB/s): min= 1912, max=419584, per=100.00%, avg=399272.78, stdev=66461.85, samples=81 iops : min= 478, max=104896, avg=99818.16, stdev=16615.56, samples=81 lat (usec) : 10=98.81%, 20=0.42%, 50=0.76%, 100=0.01%, 250=0.01% cpu : usr=18.56%, sys=16.97%, ctx=4211106, majf=0, minf=763 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,4084062,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=241MiB/s (253MB/s), 241MiB/s-241MiB/s (253MB/s-253MB/s), io=15.6GiB (16.7GB), run=66078-66078msec Disk stats (read/write): dm-3: ios=1/103363, merge=0/0, ticks=0/116893, in_queue=116893, util=35.07%, aggrios=1/204245, aggrmerge=0/1728, aggrticks=0/235478, aggrin_queue=235478, aggrutil=38.83% sdb: ios=1/204245, merge=0/1728, ticks=0/235478, in_queue=235478, util=38.83%
I still have not studied hdparm / fio analysis and possible alternate analysis methods well enough to make big conclusions. But quick study shows the Infiniband network seems to be quite transparent and performance seems good.
The latency seems to be unchanged on NVME-of compared to local NVME which is incredible. With RAID1 + LVM there is a slight increase but we are talking only a three nanosecond increase from 10 to 13 or so, unless I am reading that fio output wrong.
iSER surprises with it's solid latency but for some reason gains no advantage when using LVM striping.
My only worry would be the CPU usage of the MD RAID1. I may try DRBD backend replication and some solution on the front end to swap block devices on the fly.
So far the best performing options (rough roundings there) of networked storages would be:
Please leave a comment if you have any ideas.
All comments and corrections are welcome.