[Lustre-discuss] Multihomed question: want Lustre over IB and Ethernet

Chris Worley worleys at gmail.com
Fri Mar 7 08:34:43 PST 2008


More issues.  Now, on the clients.

The MDT/MGS/OST's are all up and mounted, showing:

# lctl list_nids
36.122.255.201 at o2ib
36.121.255.201 at tcp

Now, when I go to mount on the IB-based clients, I get:

# mount -t lustre 36.122.255.201 at o2ib:/ddnlfs /lfs
mount.lustre: mount 36.122.255.201 at o2ib:/ddnlfs at /lfs failed: No
such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)

The modprobe.conf contains:

options lnet networks=o2ib0(ib0)

And lctl looks good:

# lctl list_nids
36.122.255.1 at o2ib

But dmesg shows that it wants to go over the 36.121.x.x (tcp) network
(36.12[12].255.201 is the MGS/MDS server):

LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
for 36.121.255.201 at tcp
LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
find peer 36.121.255.201 at tcp!
LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add
initial connection
LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) NULL connection
LustreError: 10001:0:(obd_config.c:325:class_setup()) setup
ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2)
LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler())
Err -2 on cfg command:
Lustre:    cmd=cf003 0:ddnlfs-MDT0000-mdc  1:ddnlfs-MDT0000_UUID
2:36.121.255.201 at tcp
LustreError: 15c-8: MGC36.122.255.201 at o2ib: The configuration from log
'ddnlfs-client' failed (-2). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other
errors. See the syslog for more information.
LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to
process log: -2
LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 not setup
Lustre: client 0000010430913c00 umount complete
LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
mount  (-2)

Note that this setup works fine in the non-multihomed setup, so I
don't think ko2iblnd is to blame (the setup on the clients hasn't
changed at all).

What am I doing wrong?

Thanks,

Chris
On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <worleys at gmail.com> wrote:
>
> I changed my modprobe.conf to look exactly as yours, and it worked.  I
>   hadn't been using all the quotes until the doc said to... but they may
>   have indeed been the problem.
>
>   Thanks!
>
>   Chris
>
>  On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor <taylor at hpc.ufl.edu> wrote:
>   >
>   >
>   >  Do "lclt list_nids" on your mds and oss's.   They should look
>   >  something like this.
>   >
>   >  [root at hpcmds ~]# lctl list_nids
>   >  10.13.24.40 at o2ib
>   >  10.13.16.40 at tcp
>   >
>   >  Then your clients should have a nid on one or the other.
>   >
>   >  Check your dmesg output after loading lnet.   The complaints are
>   >  pretty useful.  Your modprobe.conf line looks correct although we
>   >  found we did not need all the quoting so you should check that as
>   >  well.   Ours looks like...
>   >
>   >  options lnet networks=o2ib(ib0),tcp(eth0)
>   >
>   >  My guess is that it either cannot find or does not like your ko2iblnd
>   >  module.
>   >
>   >  ct
>   >
>   >
>   >
>   >  On Mar 7, 2008, at 12:46 AM, Chris Worley wrote:
>   >
>   >  > Most everything is over IB, but I have a few systems I'd like to mount
>   >  > the Lustre fs over GigE.
>   >  >
>   >  > I think I've followed the Multihomed instructions correctly, in:
>   >  >
>   >  > http://dlc.sun.com/pdf/820-3681/820-3681.pdf
>   >  >
>   >  > My /etc/modprobe.conf on mds/mgs/oss servers (which all have both
>   >  > Ethernet and IB) includes:
>   >  >
>   >  > options lnet 'networks="tcp0(eth0),o2ib0(ib0)"'
>   >  >
>   >  > I make and mount the mdt with (which has both IB and Ethernet, subnet
>   >  > 36.122.x.x is IB, 36.121.x.x is Ethernet):
>   >  >
>   >  > # mkfs.lustre --mdt --mgs
>   >  > --mgsnode="36.122.255.201 at o2ib0,36.121.255.201 at tcp0" <... > /dev/md0
>   >  > # mount -t lustre /dev/md0  /lfs/mdtb
>   >  >
>   >  > But, at this point, the ksocklnd module is loaded rather than the
>   >  > ko2iblnd module!
>   >  >
>   >  > On the OSS, I make the fs w/ the same  "msgnode", but, when I try to
>   >  > mount it, it correctly uses the IB interface, but can't contact the
>   >  > MDS:
>   >  >
>   >  > LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
>   >  > for MGC36.122.255.201 at o2ib_0
>   >  > LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
>   >  > find peer MGC36.122.255.201 at o2ib_0!
>   >  > LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't add
>   >  > initial connection
>   >  > LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection())
>   >  > NULL connection
>   >  > LustreError: 27520:0:(obd_config.c:325:class_setup()) setup
>   >  > MGC36.122.255.201 at o2ib failed (-2)
>   >  > LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple())
>   >  > MGC36.122.255.201 at o2ib setup error -2
>   >  > LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd
>   >  > ddnlfs-OSTffff
>   >  > LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount())
>   >  > ddnlfs-OSTffff not registered
>   >  >
>   >  > It too has loaded the ksocklnd module, and not the ko2iblnd module.  I
>   >  > guess that both modules should be loaded in a multihomed case?
>   >  >
>   >  > What am I doing wrong?
>   >  >
>   >  > Thanks,
>   >  >
>   >  > Chris
>   >  > _______________________________________________
>   >  > Lustre-discuss mailing list
>   >  > Lustre-discuss at lists.lustre.org
>   >  > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   >
>   >
>



More information about the lustre-discuss mailing list