[Lustre-discuss] lustre lnet infiniband config
aayush agrawal
aayush.agrawal at calsoftinc.com
Tue Sep 30 05:33:13 PDT 2014
Hi,
I am trying to build lustre 2.5.0 against
MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on CentOS6.4 with kernel
version 2.6.32-358.
But I am not able to set lnet config settings properly. I used settings
suggested in lustre 2.x manual. But then not able to get network up
using lctl.
Details:
I have two server machines, one for mgs+mdt and second for oss and one
client machine. I want to setup Infiniband on all these machines.
I could run below steps successfully for all the three machines:
1. Run script mlnxofedinstall
# ./mlnxofedinstall -vvv --add-kernel-support --without-32bit
--without-fw-update --hpc
2. Restart openibd service
# /etc/init.d/openibd restart
3. configure ib0 interface.
4. configure lustre with o2ib
# ./configure --with-linux=Path_to_linux-2.6.32-358.18.1.el6
--with-o2ib=/usr/src/ofa_kernel/default/
5. make lustre rpms:
# make rpms
This gave me below compilation error
I looked online for this error and found bug registered on the same:
https://jira.hpdd.intel.com/browse/LU-4266
Below patch from above link solved the problem and hence I could build
lustre rpms:
http://review.whamcloud.com/#/c/8451/1
Now first I want to do the Infiniband setup for mgs and mdt on single
machine which also has Ethernet IP. Then I want to format and mount mgs
and mdt.
So I installed above created lustre rpms and then added below line in
/etc/modprobe.d/lustre.conf
options lnet networks=o2ib(ib0)
Then I rebooted the machine to remove all lustre related modules
including lnet and then ranmodprobe lnet command to add above parameters
and the ran lctl network up which is giving me below error:
LNET configure error 100: Network is down
I looked online and found below discussion on same error:
http://lists.lustre.org/pipermail/lustre-discuss/2010-June/013510.html
As per suggestion in above mail I tried with below line in
/etc/modprobe.d/lustre.conf. In below command for IB_IP, I have given
infiniband IP.
options lnet *networks=o2ib(ib0)* routes="tcp0 IB_IP at o2ib"
This command hangs for around 2 to 3 minutes and then gives error: Write
failed: Broken pipe. Same is the case for "options lnet
*networks=o2ib(ib0)*"
But if I set: options lnet *networks=tcp0(eth0),o2ib(ib0)* routes="tcp1
IB_IP at o2ib" then it gives LNET configure error 100: Network is down.
It seems that for network=o2ib(ibo) I am getting error Write failed:
Broken pipe.
Am I missing anything while following above steps? Or how do I resolve
above error?
Thanks,
Aayush.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140930/854e9b65/attachment.htm>
More information about the lustre-discuss
mailing list