[Lustre-discuss] lnet_try_match_md()) Matching packet from 12345-10.5.203.250 at tcp, match 19154486 length 728 too big

Michael D. Seymour seymour at cita.utoronto.ca
Fri May 29 12:51:35 PDT 2009


Michael D. Seymour wrote:
> Hi all,
> 
> I hope you could help us with some connection problems we are having with our 
> lustre file system. The filesystem roc consists of 6 OSSs with one OST per OSS. 
> Each OSS uses the 1.6.7 RHEL 5 kernel on Centos 5.1 (one unit uses Centos 5.3). 
> The MDS uses CentOS 5.1 and Lustre 1.6.7. 203 RHEL-based clients mount the 
> filesystem and all use Lustre 1.6.7. All are connected via a Gb ethernet switch 
> stack.
> 
> One client running CentOS 5.2 re-exports the Lustre filesystem via NFS on a 
> different network.
> 

Also got this earlier today before more verbose debug logging was enabled:

On client trinity:

May 29 10:35:47 trinity kernel: LustreError: 
5111:0:(lib-move.c:110:lnet_try_match_md()) Matching packet from 
12345-10.5.203.250 at tcp, match 20177453 length 728 too big: 704 left, 704 allowed
May 29 10:40:47 trinity kernel: LustreError: 11-0: an error occurred while 
communicating with 10.5.203.250 at tcp. The mds_close operation failed with -116
May 29 10:40:47 trinity kernel: LustreError: 
26783:0:(file.c:113:ll_close_inode_openhandle()) inode 37609433 mdc close 
failed: rc = -116
May 29 10:40:47 trinity kernel: LustreError: 
26783:0:(file.c:113:ll_close_inode_openhandle()) Skipped 1 previous similar message

On MDS rocpile:

May 29 10:35:47 rocpile kernel: LustreError: 
10227:0:(mds_open.c:1561:mds_close()) @@@ no handle for file close ino 37609433: 
cookie 0xa00c7cf9e763396b  req at ffff8101274e3400 x20177453/t0 
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens 
296/728 e 0 to 0 dl 1243608047 ref 1 fl Interpret:/0/0 rc 0/0
May 29 10:35:47 rocpile kernel: LustreError: 
10227:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-116) 
req at ffff8101274e3400 x20177453/t0 
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens 
296/728 e 0 to 0 dl 1243608047 ref 1 fl Interpret:/0/0 rc -116/0
May 29 10:35:47 rocpile kernel: LustreError: 
10227:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 1 previous similar message
May 29 10:40:47 rocpile kernel: LustreError: 
3611:0:(mds_open.c:1561:mds_close()) @@@ no handle for file close ino 37609433: 
cookie 0xa00c7cf9e763396b  req at ffff81011f0cda00 x20177453/t0 
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens 
296/728 e 0 to 0 dl 1243608347 ref 1 fl Interpret:/2/0 rc 0/0
May 29 10:40:47 rocpile kernel: LustreError: 
3611:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-116) 
req at ffff81011f0cda00 x20177453/t0 
o35->84adb9a1-8959-fcf5-cc72-81c6a1e171b8 at NET_0x200000a05cc02_UUID:0/0 lens 
296/728 e 0 to 0 dl 1243608347 ref 1 fl Interpret:/2/0 rc -116/0

I've already extended /proc/sys/lustre/timeout to 300s.

Thanks again,
Mike

-- 
Michael D. Seymour                 Phone: 416-978-8497
Scientific Computing Support       Fax: 416-978-3921
Canadian Institute for Theoretical Astrophysics, University of Toronto



More information about the lustre-discuss mailing list