[Lustre-discuss] OST - low MB/s

Fri Sep 11 07:20:00 PDT 2009

You might also want to look at the 20533 ticket to see if it is related. 
There are kernel patches for RHEL 5.3 which improves the Lustre 
performance on RAID, but there are no plans for CentOS kernel.

Rafael David Tinoco wrote:
> I think Ive discovered the problem.
> I was using multipathd in my "raid" devices.
> Getting arround 200MB/s in raid6 with 10 disks.
>
> Now.. testing without the multipaths:
>
> root at a02n00:~# mdadm --detail /dev/md20
> /dev/md20:
>         Version : 00.90.03
>   Creation Time : Thu Sep 10 18:27:28 2009
>      Raid Level : raid6
>      Array Size : 7814099968 (7452.11 GiB 8001.64 GB)
>   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 20
>     Persistence : Superblock is persistent
>
>     Update Time : Thu Sep 10 18:27:28 2009
>           State : clean
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>
>      Chunk Size : 128K
>
>            UUID : 9cf9dd02:d53bc608:62e867a4:1df781ca
>          Events : 0.1
>
>     Number   Major   Minor   RaidDevice State
>        0      66      144        0      active sync   /dev/sdap
>        1      66      160        1      active sync   /dev/sdaq
>        2      66      176        2      active sync   /dev/sdar
>        3      66      192        3      active sync   /dev/sdas
>        4      66      208        4      active sync   /dev/sdat
>        5      66      224        5      active sync   /dev/sdau
>        6      66      240        6      active sync   /dev/sdav
>        7      67        0        7      active sync   /dev/sdaw
>        8       8       16        8      active sync   /dev/sdb
>        9       8      112        9      active sync   /dev/sdh
>
> root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=1024k count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 21.0579 seconds, 498 MB/s
>
> root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=1024k count=99999
> 99999+0 records in
> 99999+0 records out
> 104856551424 bytes (105 GB) copied, 221.137 seconds, 474 MB/s
>
> Much better :D
>
> So basically linux + mpt fusion + multipathd + mdadm not so good option for OST!!!
>
> -----Original Message-----
> From: Hung-Sheng.Tsao at Sun.COM [mailto:Hung-Sheng.Tsao at Sun.COM] 
> Sent: Thursday, September 10, 2009 6:25 PM
> To: Rafael David Tinoco
> Subject: Re: [Lustre-discuss] OST - low MB/s
>
> so what is the out put if U use 128k*8=bs?
>
>
> Rafael David Tinoco wrote:
>   
>> My journal device is:
>>
>> root at a01n00:~# mdadm --detail /dev/md10
>> /dev/md10:
>>         Version : 00.90.03
>>   Creation Time : Thu Sep 10 17:49:07 2009
>>      Raid Level : raid1
>>      Array Size : 987840 (964.85 MiB 1011.55 MB)
>>   Used Dev Size : 987840 (964.85 MiB 1011.55 MB)
>>    Raid Devices : 2
>>   Total Devices : 2
>> Preferred Minor : 10
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Thu Sep 10 17:49:07 2009
>>           State : clean
>>  Active Devices : 2
>> Working Devices : 2
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>            UUID : e48152dd:adb1c505:137aa99c:1b3eece4
>>          Events : 0.1
>>
>>     Number   Major   Minor   RaidDevice State
>>        0     253       17        0      active sync   /dev/dm-17
>>        1     253       14        1      active sync   /dev/dm-14
>>
>> My OST device is:
>>
>> root at a01n00:~# mdadm --detail /dev/md20
>> /dev/md20:
>>         Version : 00.90.03
>>   Creation Time : Thu Sep 10 17:49:23 2009
>>      Raid Level : raid6
>>      Array Size : 7814099968 (7452.11 GiB 8001.64 GB)
>>   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>>    Raid Devices : 10
>>   Total Devices : 10
>> Preferred Minor : 20
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Thu Sep 10 18:06:20 2009
>>           State : clean
>>  Active Devices : 10
>> Working Devices : 10
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>      Chunk Size : 128K
>>
>>            UUID : b80fb16d:38c47a56:fdf2b5e9:9ff47af3
>>          Events : 0.2
>>
>>     Number   Major   Minor   RaidDevice State
>>        0     253       11        0      active sync   /dev/dm-11
>>        1     253       12        1      active sync   /dev/dm-12
>>        2     253       13        2      active sync   /dev/dm-13
>>        3     253       15        3      active sync   /dev/dm-15
>>        4     253       16        4      active sync   /dev/dm-16
>>        5     253       18        5      active sync   /dev/dm-18
>>        6     253       19        6      active sync   /dev/dm-19
>>        7     253       20        7      active sync   /dev/dm-20
>>        8     253        1        8      active sync   /dev/dm-1
>>        9     253       21        9      active sync   /dev/dm-21
>>
>> -----Original Message-----
>> From: Hung-Sheng.Tsao at Sun.COM [mailto:Hung-Sheng.Tsao at Sun.COM] 
>> Sent: Thursday, September 10, 2009 6:19 PM
>> To: Rafael David Tinoco
>> Cc: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] OST - low MB/s
>>
>> not sure I understand Ur setup
>> which one is the raid6 lun?
>> which are the individual HD?
>>
>>
>> Rafael David Tinoco wrote:
>>   
>>     
>>> 216MB/s using 8*128 (1024k) as bs. Too low for 8 active disks.. right ? Arround 27MB/s.. from 50MB/s in the "real" disk.
>>>
>>> -----Original Message-----
>>> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Dr. Hung-Sheng Tsao
>>> (LaoTsao)
>>> Sent: Thursday, September 10, 2009 5:50 PM
>>> To: Rafael David Tinoco
>>> Cc: lustre-discuss at lists.lustre.org
>>> Subject: Re: [Lustre-discuss] OST - low MB/s
>>>
>>> raid 6 chunk size=128k the full strip size will be 128k*8 (for 10 disks 
>>> 8+2 raid 6)
>>> in Ur dd test one should use bs=128k*8, then each 8 HDD will be busy
>>> regards
>>>
>>>
>>> Rafael David Tinoco wrote:
>>>   
>>>     
>>>       
>>>> With this RAID5 configuration Im getting:
>>>>
>>>> root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=128k count=10000
>>>>
>>>> 10000+0 records in
>>>>
>>>> 10000+0 records out
>>>>
>>>> 1310720000 bytes (1.3 GB) copied, 5.20774 seconds, 252 MB/s
>>>>
>>>> root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=128k count=10000
>>>>
>>>> 10000+0 records in
>>>>
>>>> 10000+0 records out
>>>>
>>>> 1310720000 bytes (1.3 GB) copied, 5.12 seconds, 256 MB/s
>>>>
>>>> So, 80MB/s using these md20 as OSTs isnt quite right .
>>>>
>>>> *From:* lustre-discuss-bounces at lists.lustre.org 
>>>> [mailto:lustre-discuss-bounces at lists.lustre.org] *On Behalf Of *Rafael 
>>>> David Tinoco
>>>> *Sent:* Thursday, September 10, 2009 4:26 PM
>>>> *To:* lustre-discuss at lists.lustre.org
>>>> *Subject:* [Lustre-discuss] OST - low MB/s
>>>>
>>>> Hello,
>>>>
>>>> I'm having problems now with my "OSTs" throughput.
>>>>
>>>> I have 4 OSS each one with 2 OSTs. These OSTs are RAID6 with 10 disks, 
>>>> chunk size of 128k.
>>>>
>>>> These disks are from J4400 (JBOD) connected in multipath using multipathd.
>>>>
>>>> Each disk speed is giving me 50MB/s with dd.
>>>>
>>>> With lustre, using IOR or DD I can get only arround 80MB/s. I was 
>>>> expecting for 8 active disks in raid 8*50 = something between 300 and 
>>>> 400MB/s.
>>>>
>>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>>
>>>> 0.00 0.00 6.00 9.06 0.00 84.94
>>>>
>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await 
>>>> svctm %util
>>>>
>>>> md10 0.00 0.00 0.00 398.00 0.00 1.55 8.00 0.00 0.00 0.00 0.00
>>>>
>>>> md11 0.00 0.00 0.00 380.00 0.00 1.48 8.00 0.00 0.00 0.00 0.00
>>>>
>>>> md20 0.00 0.00 0.00 158.00 0.00 79.00 1024.00 0.00 0.00 0.00 0.00
>>>>
>>>> md21 0.00 0.00 0.00 159.00 0.00 79.50 1024.00 0.00 0.00 0.00 0.00
>>>>
>>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>>
>>>> 0.00 0.00 5.94 9.32 0.00 84.74
>>>>
>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await 
>>>> svctm %util
>>>>
>>>> md10 0.00 0.00 0.00 407.50 0.00 1.59 8.00 0.00 0.00 0.00 0.00
>>>>
>>>> md11 0.00 0.00 0.00 394.00 0.00 1.54 8.00 0.00 0.00 0.00 0.00
>>>>
>>>> md20 0.00 0.00 0.00 159.00 0.00 79.50 1024.00 0.00 0.00 0.00 0.00
>>>>
>>>> md21 0.00 0.00 0.00 158.00 0.00 79.00 1024.00 0.00 0.00 0.00 0.00
>>>>
>>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>>
>>>> 0.00 0.00 6.37 9.43 0.00 84.21
>>>>
>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await 
>>>> svctm %util
>>>>
>>>> md10 0.00 0.00 0.00 410.50 0.00 1.60 8.00 0.00 0.00 0.00 0.00
>>>>
>>>> md11 0.00 0.00 0.00 376.00 0.00 1.47 8.00 0.00 0.00 0.00 0.00
>>>>
>>>> md20 0.00 0.00 0.00 165.00 0.00 82.50 1024.00 0.00 0.00 0.00 0.00
>>>>
>>>> md21 0.00 0.00 0.00 165.00 0.00 82.50 1024.00 0.00 0.00 0.00 0.00
>>>>
>>>> Any clues ?
>>>>
>>>> Rafael David Tinoco - Sun Microsystems
>>>>
>>>> Systems Engineer - High Performance Computing
>>>>
>>>> Rafael.Tinoco at Sun.COM - 55.11.5187.2194
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>   
>>>>     
>>>>       
>>>>         
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>   
>>>     
>>>       
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090911/c8c08aa6/attachment.htm>