[Lustre-discuss] Fwd: lctl --device XX deactivate doesn't make OST read only

Alastair Ferguson aferguson at cmcrc.com
Tue Jun 18 18:41:30 PDT 2013


Update - lfs df -h is not working correctly.

It said I had 44M free 110% so I did:

lfs find /data -O AC3-OST0010_UUID -size +20G

Then it found /data/smarts/ksc_mq/am/03456.am 

so I did:

cp -vp /data/smarts/ksc_mq/am/03456.am /data/smarts/ksc_mq/am/03456.am.bkp

Then when it had finished:

rm -f /data/smarts/ksc_mq/am/03456.am
mv /data/smarts/ksc_mq/am/03456.am.bkp /data/smarts/ksc_mq/am/03456.am

This file was 359GB therefore, lfs df -h HAS TO BE wrong.

How can I make it right?

Alastair Ferguson
IT Manager
Capital Markets CRC Limited (CMCRC)
Telephone: +61 2 8088 4222
Mobile: +61 424 235 159
Fax: +61 2 8088 4201
www.cmcrc.com 



Capital Markets CRC Ltd - Confidential Communication
The information contained in this e-mail is confidential.  It is intended for the addressee only.  If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates.  If it is a private communication, care should be taken in opening it to ensure that undue offence is not given.


Begin forwarded message:

> From: Alastair Ferguson <aferguson at cmcrc.com>
> Subject: Re: [Lustre-discuss] lctl --device XX deactivate doesn't make OST read only
> Date: 19 June 2013 9:14:45 AM AEST
> To: Andreas Dilger <andreas.dilger at intel.com>, "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> 
> Sorry - final update.
> 
> It appears that two osts are both still at 100% (don't know how I got that wrong) and 40Mb space.
> 
> I tried:
> 
>  lfs find /data -O AC3-OST000a_UUID -size +20G | lfs_migrate -y
> 
> Now getting this:
> 
> /data/smarts/ksc_mq/am/03723.am: llapi_semantic_traverse: Failed to open '/data/home/zzhao/workspace/topical_collocation_model/results/ir_evaluation/sjmn2k_tng/DocLDA/k050-alpha0.10-gamma0.01/GibbsRun-2': No such file or directory (2)
> error: find failed for +20G.
> rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
> rsync: write failed on "/data/smarts/ksc_mq/am/03723.am.tmp.N13504": No space left on device (28)
> 
> 
> Also doing:
> 
> lfs_migrate /data/workflow
> 
> (8TB in size)
> 
> & 
> 
> lfs_migrate /data/raw
> 
> (15TB)
> 
> and still:
> 
> AC3-OST000a_UUID           14.3T       13.6T       46.1M 100% /data[OST:10]
> AC3-OST0010_UUID            7.2T        6.8T       46.1M 100% /data[OST:16]
> 
> We can't run our processes because of the no space on device errors. Help!
> 
> Alastair Ferguson
> IT Manager
> Capital Markets CRC Limited (CMCRC)
> Telephone: +61 2 8088 4222
> Mobile: +61 424 235 159
> Fax: +61 2 8088 4201
> www.cmcrc.com 
> 
> 
> 
> Capital Markets CRC Ltd - Confidential Communication
> The information contained in this e-mail is confidential.  It is intended for the addressee only.  If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates.  If it is a private communication, care should be taken in opening it to ensure that undue offence is not given.
> 
> 
> On 18/06/2013, at 9:26 AM, "Dilger, Andreas" <andreas.dilger at intel.com> wrote:
> 
>> On 2013/17/06 1:12 AM, "Alastair Ferguson" <aferguson at cmcrc.com> wrote:
>>> OK, bit of a weird one, so 3 OSTs are 100%, but there is 30TB of free
>>> space around the other OSTs, so I do:
>>> 
>>> lfs df -h
>>> 
>>> Get this part as one of the OSTs I need to deactivate:
>>> 
>>> AC3-OST000c_UUID           14.3T       13.6T       87.4M 100%
>>> /data[OST:12]
>>> 
>>> then
>>> 
>>> lctl dl
>>> 
>>> 19 UP osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>>> 
>>> Then
>>> 
>>> lctl --device 19 deactivate
>>> 
>>> then
>>> 
>>> lctl dl:
>>> 
>>> 19 IN osc AC3-OST000c-osc AC3-mdtlov_UUID 5
>>> 
>>> Should be read only right>>?
>> 
>> Right, this is the MDS OSC device, so no new files should be allocated on
>> that OST.
>> 
>>> Then
>>> 
>>> lfs getstripe -O AC3-OST000c_UUID -rv -d /data | grep /data >>
>>> ost000c_raw.txt
>>> 
>>> To find the files in the filesystem (/data) and strip out all the stuff
>>> you don't need.  Then:
>>> 
>>> while read line; do cp -p "$line" "$line.___bak"; rm -f "$line"; mv
>>> "$line.___bak" "$line";  done < ost000c_raw.txt
>>> 
>>> This should move the data off the OST but it doesn't. I have used this
>>> procedure before to remove data from a whole server (which worked) and I
>>> can see when I  lfs df -h
>>> the ost emptying but in this case it goes up and down suggesting it is
>>> copying BACK to the same OST despite the fact it is IN not UP when lctl
>>> dl is run.
>> 
>> You should look at "lfs_migrate" and its man page, for a more robust
>> mechanism for
>> doing the above migration.  Your script is unsafe if interrupted after "rm
>> -f" but
>> before "mv" moves the old file into place.  You can also use "lfs_migrate"
>> in a
>> pipeline, so that it only moves new files, while your script would re-move
>> the same
>> files repeatedly if interrupted and restarted.
>> 
>>> How can I get files off this as I get errors saying no space on device??
>> 
>> Your process _should_ be working, but if you are moving small files the
>> effects may
>> be slow.  As mentioned in the "lfs_migrate" man page, you should select
>> large files
>> to migrate, since you will get better IO performance, and will free space
>> more quickly.
>> 
>> Cheers, Andreas
>> -- 
>> Andreas Dilger
>> 
>> Lustre Software Architect
>> Intel High Performance Data Division
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130619/8118ccea/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 13489 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130619/8118ccea/attachment.png>


More information about the lustre-discuss mailing list