[Lustre-discuss] tcp network load balancing understanding lustre 1.8
Arden Wiebe
albert682 at yahoo.com
Sun May 10 06:12:49 PDT 2009
Mag, your welcome. From the page referenced first for a search for Linux Bonding it states:
How many bonding devices can I have?
There is no limit.
How many slaves can a bonding device have?
This is limited only by the number of network interfaces Linux supports and/or the number of network cards you can place in your system.
--- On Sun, 5/10/09, Mag Gam <magawake at gmail.com> wrote:
> From: Mag Gam <magawake at gmail.com>
> Subject: Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8
> To: "Arden Wiebe" <albert682 at yahoo.com>
> Cc: "Andreas Dilger" <adilger at sun.com>, "Michael Ruepp" <michael at schwarzfilm.ch>, lustre-discuss at lists.lustre.org
> Date: Sunday, May 10, 2009, 5:48 AM
> Thanks for the screen shot Arden.
>
> What is the maximum # of slaves you can have on a bonded
> interface?
>
>
>
> On Sun, May 10, 2009 at 12:15 AM, Arden Wiebe <albert682 at yahoo.com>
> wrote:
> >
> > Bond0 knows which interface to utilize because all the
> other eth0-5 are designated as slaves in their configuration
> files. The manual is fairly clear on that.
> >
> > In the screenshot the memory used in gnome system
> monitor is at 452.4 MiB of 7.8 GiB and the sustained
> bandwidth to the OSS and OST is 404.2 MiB/s which
> corresponds roughly to what collectl is showing for KBWrite
> for Disks. Collectl shows a few different results for
> Disks, Network and Lustre OST and I believe it to be
> measuring the other OST on the network around 170MiB/s if
> you view the other screenshot for OST1 or lustrethree.
> >
> > In the screenshots Lustreone=MGS Lustretwo=MDT
> Lustrethree=OSS+raid10 target Lustrefour=OSS+raid10 target
> >
> > To help clarify the entire network and stress testing
> I did with all the clients I could give it is at
> www.ioio.ca/Lustre-tcp-bonding/images/html and
> www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html
> >
> > Proper benchmarking would be nice though as I just hit
> it with everything I could and it lived so I was happy. I
> found the manual to be lacking in benchmarking and really
> wanted to make nice graphs of it all but failed with iozone
> to do so for some reason.
> >
> > I'll be taking a run at upgrading everything to 1.8 in
> the coming week or so and when I do I'll grab some new
> screenshots and post the relevant items to the wiki.
> Otherwise if someone else wants to post the existing
> screenshots your welcome to use them as they do detail a
> ground up build. Apparently 1.8 is great with small files
> now so it should work even better with
> www.oil-gas.ca/phpsysinfo and www.linuxguru.ca/phpsysinfo
> >
> >
> > --- On Sat, 5/9/09, Andreas Dilger <adilger at sun.com>
> wrote:
> >
> >> From: Andreas Dilger <adilger at sun.com>
> >> Subject: Re: [Lustre-discuss] tcp network load
> balancing understanding lustre 1.8
> >> To: "Arden Wiebe" <albert682 at yahoo.com>
> >> Cc: lustre-discuss at lists.lustre.org,
> "Michael Ruepp" <michael at schwarzfilm.ch>
> >> Date: Saturday, May 9, 2009, 11:31 AM
> >> On May 09, 2009 09:18 -0700,
> >> Arden Wiebe wrote:
> >> > This might help answer some questions.
> >> > http://ioio.ca/Lustre-tcp-bonding/OST2.png which shows
> >> my mostly not
> >> > tuned OSS and OST's pulling 400+MiB/s over
> TCP Bonding
> >> provided by the
> >> > kernel complete with a cat of the
> modeprobe.conf
> >> file. You have the other
> >> > links I've sent you but the picture above is
> relevant
> >> to your questions.
> >>
> >> Arden, thanks for sharing this info. Any chance
> you
> >> could post it to
> >> wiki.lustre.org? It would seem there is one bit
> of
> >> info missing somewhere -
> >> how does bond0 know which interfaces to use?
> >>
> >>
> >> Also, another oddity - the network monitor is
> showing
> >> 450MiB/s Received,
> >> yet the disk is showing only about 170MiB/s going
> to the
> >> disk. Either
> >> something is wacky with the monitoring (e.g. it is
> counting
> >> Received for
> >> both the eth* networks AND bond0), or Lustre is
> doing
> >> something very
> >> wierd and retransmitting the bulk data like crazy
> (seems
> >> unlikely).
> >>
> >>
> >> > --- On Thu, 5/7/09, Michael Ruepp <michael at schwarzfilm.ch>
> >> wrote:
> >> >
> >> > > From: Michael Ruepp <michael at schwarzfilm.ch>
> >> > > Subject: [Lustre-discuss] tcp network
> load
> >> balancing understanding lustre 1.8
> >> > > To: lustre-discuss at lists.lustre.org
> >> > > Date: Thursday, May 7, 2009, 5:50 AM
> >> > > Hi there,
> >> > >
> >> > > I am configured a simple tcp lustre 1.8
> with one
> >> mdc (one
> >> > > nic) and two
> >> > > oss (four nic per oss)
> >> > > As well as in the 1.6 documentation,
> the
> >> multihomed
> >> > > sections is a
> >> > > little bit unclear to me.
> >> > >
> >> > > I give every NID a IP in the same
> subnet, eg:
> >> > > 10.111.20.35-38 - oss0
> >> > > and 10.111.20.39-42 oss1
> >> > >
> >> > > Do I have to make modprobe.conf.local
> look like
> >> this to
> >> > > force lustre
> >> > > to use all four interfaces parallel:
> >> > >
> >> > > options lnet
> networks=tcp0(eth0,eth1,eth2,eth3)
> >> > > Because on Page 138 the 1.8 Manual
> says:
> >> > > "Note – In the case of TCP-only
> clients, the
> >> first
> >> > > available non-
> >> > > loopback IP interface
> >> > > is used for tcp0 since the interfaces
> are not
> >> specified. "
> >> > >
> >> > > or do I have to specify it like this:
> >> > > options lnet networks=tcp
> >> > > Because on Page 112 the lustre 1.6
> Manual says:
> >> > > "Note – In the case of TCP-only
> clients, all
> >> available IP
> >> > > interfaces
> >> > > are used for tcp0
> >> > > since the interfaces are not specified.
> If there
> >> is more
> >> > > than one, the
> >> > > IP of the first one
> >> > > found is used to construct the tcp0
> ID."
> >> > >
> >> > > Which is the opposite of the 1.8 Manual
> >> > >
> >> > > My goal ist to let lustre utilize all
> four Gb
> >> Links
> >> > > parallel. And my
> >> > > Lustre Clients are equipped with two Gb
> links
> >> which should
> >> > > be utilized
> >> > > by the lustre clients as well (eth0,
> eth1)
> >> > >
> >> > > Or is bonding the better solution in
> terms of
> >> performance?
> >> > >
> >> > > Thanks very much for input,
> >> > >
> >> > > Michael Ruepp
> >> > > Schwarzfilm AG
> >> > >
> >> > >
> >> > >
> _______________________________________________
> >> > > Lustre-discuss mailing list
> >> > > Lustre-discuss at lists.lustre.org
> >> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >> > >
> >> >
> >> >
> >> >
> >> >
> _______________________________________________
> >> > Lustre-discuss mailing list
> >> > Lustre-discuss at lists.lustre.org
> >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>
> >> Cheers, Andreas
> >> --
> >> Andreas Dilger
> >> Sr. Staff Engineer, Lustre Group
> >> Sun Microsystems of Canada, Inc.
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
>
More information about the lustre-discuss
mailing list