[Lustre-discuss] lustre lnet infiniband config

aayush agrawal aayush.agrawal at calsoftinc.com
Tue Sep 30 05:33:13 PDT 2014


Hi,

I am trying to build lustre 2.5.0 against 
MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on CentOS6.4 with kernel 
version 2.6.32-358.
But I am not able to set lnet config settings properly. I used settings 
suggested in lustre 2.x manual. But then not able to get network up 
using lctl.

Details:

I have two server machines, one for mgs+mdt and second for oss and one 
client machine. I want to setup Infiniband on all these machines.
I could run below steps successfully for all the three machines:
1. Run script mlnxofedinstall
# ./mlnxofedinstall  -vvv --add-kernel-support --without-32bit 
--without-fw-update --hpc
2. Restart openibd service
#  /etc/init.d/openibd restart
3. configure ib0 interface.
4. configure lustre with o2ib
# ./configure --with-linux=Path_to_linux-2.6.32-358.18.1.el6 
--with-o2ib=/usr/src/ofa_kernel/default/

5. make lustre rpms:
     # make rpms
This gave me below compilation error
I looked online for this error and found bug registered on the same: 
https://jira.hpdd.intel.com/browse/LU-4266
Below patch from above link solved the problem and hence I could build 
lustre rpms:
http://review.whamcloud.com/#/c/8451/1

Now first I want to do the Infiniband setup for mgs and mdt on single 
machine which also has Ethernet IP. Then I want to format and mount mgs 
and mdt.
So I installed above created lustre rpms and then added below line in 
/etc/modprobe.d/lustre.conf
options lnet networks=o2ib(ib0)

Then I rebooted the machine to remove all lustre related modules 
including lnet and then ranmodprobe lnet command to add above parameters 
and the ran lctl network up which is giving me below error:
LNET configure error 100: Network is down

I looked online and found below discussion on same error:
http://lists.lustre.org/pipermail/lustre-discuss/2010-June/013510.html

As per suggestion in above mail I tried with below line in 
/etc/modprobe.d/lustre.conf.  In below command for IB_IP, I have given 
infiniband IP.
options lnet *networks=o2ib(ib0)* routes="tcp0 IB_IP at o2ib"
This command hangs for around 2 to 3 minutes and then gives error: Write 
failed: Broken pipe. Same is the case for "options lnet 
*networks=o2ib(ib0)*"
But if I set: options lnet *networks=tcp0(eth0),o2ib(ib0)* routes="tcp1 
IB_IP at o2ib" then it gives LNET configure error 100: Network is down.

It seems that for network=o2ib(ibo) I am getting error Write failed: 
Broken pipe.
Am I missing anything while following above steps? Or how do I resolve 
above error?

Thanks,
Aayush.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140930/854e9b65/attachment.htm>


More information about the lustre-discuss mailing list