[Lustre-discuss] Odd broken behaviour on one lustre client
Mark Field
mnfield at gmail.com
Thu Nov 22 15:16:29 PST 2012
Yes, this machine can't access the mounted file system and caused a kernel
panic when we tried to access some files, it also seems to give different
and incorrect values when du or df is run on it.
On 22 November 2012 19:34, Dilger, Andreas <andreas.dilger at intel.com> wrote:
> On 11/22/12 10:25 AM, "Mark Field" <mnfield at gmail.com> wrote:
>
> >Hi,
> >
> >I am currently using lustre 1.8, after a OST failure, I deactivated the
> >OST on the MDS and made the change permanent. If I now run lctl dl on
> >the client nodes all of them except one show the OST as inactive
> > (device 7 in the output below)
> >
> >
> > 0 UP mgc MGC10.214.4.201 at o2ib 78b8432f-6331-cae7-8d75-dbaba9708056 5
> > 1 UP lov optstr01-clilov-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 4
> > 2 UP mdc optstr01-MDT0000-mdc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> > 3 UP osc optstr01-OST0000-osc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> > 4 UP osc optstr01-OST0001-osc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> > 5 UP osc optstr01-OST0002-osc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> > 6 UP osc optstr01-OST0003-osc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> > 7 IN osc optstr01-OST0004-osc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> > 8 UP osc optstr01-OST0008-osc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> > 9 UP osc optstr01-OST0005-osc-ffff8103350d0400
> >cd18b560-e476-f55d-6df1-edcbd68c361b 5
> >
> >
> >
> >The other client is not working correctly, lctl dl looks like this
> >
> >
> > 0 UP mgc MGC10.214.4.201 at o2ib 94226c2b-6914-6a92-5c6b-2a27ebff676e 5
> > 1 UP lov optstr01-clilov-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 4
> > 2 UP mdc optstr01-MDT0000-mdc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 5
> > 3 UP osc optstr01-OST0000-osc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 5
> > 4 UP osc optstr01-OST0001-osc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 5
> > 5 UP osc optstr01-OST0002-osc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 5
> > 6 UP osc optstr01-OST0003-osc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 5
> > 7 UP osc optstr01-OST0004-osc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 4
> > 8 UP osc optstr01-OST0008-osc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 5
> > 9 UP osc optstr01-OST0005-osc-ffff81016d482400
> >e7a4a072-c0db-aac9-c13f-bd4189986407 5
> >
> >
> >
> >Notice device 7 is 'UP' rather than 'IN' and also the last number on the
> >line is 4 not 5. I tried umount and re-mounting the client, and
> >rebooting, but it always comes back the same. Is there persistent
> > data somewhere on the client that is corrupt in someway and needs to be
> >deleted?
>
> No, there is no persistent data on the clients at all. They get a new
> UUID each time they mount, so the servers can't even tell it is the same
> node from one mount to the next.
>
> Presumably this is causing a visible problem, or you wouldn't have
> mentioned it?
>
> Cheers, Andreas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20121122/3ff30154/attachment.htm>
More information about the lustre-discuss
mailing list