[Lustre-discuss] Understanding of MMP

Mon Oct 19 12:54:46 PDT 2009

On Monday 19 October 2009, Michael Schwartzkopff wrote:
> Am Montag, 19. Oktober 2009 20:42:19 schrieben Sie:
> > On Monday 19 October 2009, Andreas Dilger wrote:
> > > On 19-Oct-09, at 08:46, Michael Schwartzkopff wrote:
> > > > perhaps I have a problem understanding multiple mount protection
> > > > MMP. I have a
> > > > cluster. When a failover happens sometimes I get the log entry:
> > > >
> > > > Oct 19 15:16:08 sososd7 kernel: LDISKFS-fs warning (device dm-2):
> > > > ldiskfs_multi_mount_protect: Device is already active on another
> > > > node. Oct 19 15:16:08 sososd7 kernel: LDISKFS-fs warning (device
> > > > dm-2): ldiskfs_multi_mount_protect: MMP failure info: last update
> > > > time: 1255958168,
> > > > last update node: sososd3, last update device: dm-2
> > > >
> > > > Does the second line mean that my node (sososd7) tried to mount /dev/
> > > > dm-2 but
> > > > MMP prevented it from doing so because the last update from the old
> > > > node
> > > > (sososd3) was too recent?
> > >
> > > The update time stored in the MMP block is purely for informational
> > > purposes.  It actually uses a sequence counter that has nothing to do
> > > with the system clock on either of the nodes (since they may not be in
> > > sync).
> > >
> > > What that message actually means is that sososd7 tried to mount the
> > > filesystem on dm-2 (which likely has another "LVM" name that the kernel
> > > doesn't know anything about) but the MMP block on the disk was modified
> > > by sososd3 AFTER sososd7 first looked at it.
> >
> > Probably, bug#19566. Michael, which Lustre version do you exactly use?
> >
> >
> > Thanks,
> > Bernd
> 
> I got version 1.8.1.1 which was published last week. Is the fix included or
> only in 1.8.2?

According to the bugzilla (https://bugzilla.lustre.org/show_bug.cgi?id=19566) 
not yet in 1.8.1.1. Our ddn internal releases of course do have it. And from 
my point of view this is a really important fix. Ever since 1.6.7 there is 
also no chance anymore to figure out the unsuccessful umount from the resource 
agent (up to 1.6.6 /proc/fs/lustre/.../mntdev would tell you the device is 
still mounted).

To be sure this really your issue, do you see this in your kernel logs?

                CERROR("Mount %p is still busy (%d refs), giving up.\n",
                       mnt, atomic_read(&mnt->mnt_count));

-- 
Bernd Schubert
DataDirect Networks