[lustre-devel] Is network interface fail-over on same server on the road map?

Sat Sep 15 03:23:22 PDT 2012

Brian,

Thanks for raising this point - I think you've pointed out an important shortcoming that we should fix.

If I can elaborate on Nathan's description a bit, Lustre clients fail over by trying repeatedly to connect to one of the servers listed as an alternative host for the OST/MDT they were just trying to communicate with.   The problem you face occurs because NID selection for a given server is static, based purely on (1) network distance (i.e. # of routers you'd have to traverse to reach the NID) and (2) order of declaration.   If server NID selection were more dynamic (e.g. by using LNET ping responsiveness instead of declaration order), clients would be able to reconnect to the server via one of its alternative NIDs.  On reconnection, the client would discover it's the same OST/MDT incarnation that it was previously connected to  and simply resend any pending requests.

        Cheers,
                             Eric

-----Original Message-----
From: lustre-devel-bounces at lists.opensfs.org [mailto:lustre-devel-bounces at lists.opensfs.org] On Behalf Of Nathan Rutman
Sent: Thursday, August 09, 2012 4:55 AM
To: Brian O'Connor
Cc: <lustre-devel at lists.opensfs.org>
Subject: Re: [lustre-devel] Is network interface fail-over on same server on the road map?

You are correct.  A node will choose a single Lnet network (ib0, ib1, or eth0) over which to talk at the establishment of the connection, and never change its mind.  If the server becomes unresponsive, the client will try one network on the failover list. 
The reason why you can't failover to a second network is that the server state in that case is not "just failed over and awaiting replay" but is instead "perfectly happy in normal operating mode", and so doesn't know what to do with this new client who wants to do replay. 
This could be fixed by some more sophisticated handling on the server, or a new Lnet failover mode, but I don't think this is in anybody's plans at the moment. 

On Aug 8, 2012, at 7:41 PM, Brian O'Connor <briano at sgi.com> wrote:

> 
> Hi
> 
>   at present AFAIK a lustre connection will not failover between NIDS 
> on the same server. For example if you have a 2 MDS servers with two 
> IB interfaces and an ETH interface configured as a HA pair, you can 
> configure the client to *mount* on the basis of which they can see, 
> but once the connection is made if the network switch, cable or HCA 
> fails then the client cannot failover to the other NID on the *same* 
> server, it fails over to a NID on the HA pair (and the resources 
> probably wont be on the HA pair)
> 
> So my question is... is this  feature on a road map, or is it just not compatible with other aspects of Lustre.
> 
> So,  to try and be clear, assuming the following;
> 
> Servers
> MDS1:  ib0=192.168.1.1/24, ib1=192.168.2.1/24, eth0=10.0.0.1/24
> MDS2:  ib0=192.168.1.2/24, ib1=192.168.2.2/24, eth0=10.0.0.2/24
> OSS1:  ib0=192.168.1.11/24, ib1=192.168.2.11/24, eth0=10.0.0.11/24 ..
> OSS20: ib0=192.168.1.30/24, ib1=192.168.2.30/24, eth0=10.0.0.30/24
> 
> Clients
> 
> c1:    ib0=192.168.1.101/24, ib1=192.168.2.101/24, eth0=10.0.0.101/24
> ..
> c100:  ib0=192.168.1.200/24, ib1=192.168.2.200/24, eth0=10.0.0.200/24
> 
> I mount on the client with
> 
> mount -t lustre 
> 192.168.1.1 at o2ib,192.168.2.1 at o2ib,10.0.0.1 at tcp:192.168.1.2 at o2ib,192.16
> 8.2.2 at o2ib,10.0.0.2 at tcp:/lustre /lustre
> 
> (and setup to load balance odd/even clients to odd/even ips on 
> servers)
> 
> client "c1" mounts initially via 192.168.1.1 at o2ib. If later the client fails to communicate on  192.168.1.1 at o2ib at present it will *not* try 192.168.2.1 or 10.0.0.1, it will try  192.168.1.2 or 192.168.2.2  or 10.0.0.2  on the configured HA pair. This complicates the HA setup in that you have to monitor the networks, and stonith all the mds/oss so that the resources are available on the HA pair when needed, and so far this has never worked out for me.
> 
> Is there an architectural reason that you can't failover over to another NID on the same server?
> 
> 
> 
> --
> Brian O'Connor
> -------------------------------------------------------------
> SGI Consulting
> Email: briano at sgi.com, Mobile +61 417 746 452
> Phone: +61 3 9963 1900, Fax:  +61 3 9963 1902
> 691 Burke Road, Camberwell, Victoria, 3124 AUSTRALIA 
> http://www.sgi.com/support/services
> -------------------------------------------------------------
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.opensfs.org
> http://lists.opensfs.org/listinfo.cgi/lustre-devel-opensfs.org
_______________________________________________
lustre-devel mailing list
lustre-devel at lists.opensfs.org
http://lists.opensfs.org/listinfo.cgi/lustre-devel-opensfs.org
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.