Re: data corruption: revalidating a (removable) hdd/flash on re-insert

From: Michael Tokarev
Date: Wed Nov 05 2008 - 03:05:21 EST


Pavel Machek wrote:
On Wed 2008-11-05 00:22:51, Michael Tokarev wrote:
Pavel Machek wrote:
[]
So can we simply claim 'media changed' on last close/unmount? Sure,
sometimes media was not changed, but that only hurts performance, not
correctness... ?

Well, that's what my tiny proggy, which I used here to work around the
problem, does. It constantly opens/closes the /dev/sdFOO, every 0.5s
currently (I don't think I will be able to replace a media faster than
half a second :), in order to catch REMOVALs of media -- because when
the drive does not see the media anymore, it correctly reports that
the media has changed...


Ok, so we you need to do is to put it into kernel and activate it
via blacklist...?

I'm fine with my solution.. ;) Especially once Kay suggested to
look at /proc/mounts for notifications.

Original problem was that I didn't understand what happens, and
blamed kernel for "breaking" the working device (it looks like
it never worked in the first place, it was just that we never hit
the bug before). Once the problem become clear (thanks Kay!),
I wrote the proggy mentioned above - it's obviously a gross hack,
but it stops the corruption for me.

Generally the solution can be one of the 3:

a) leave it as it is now, since it had never been bought up
before and hence does not affect many people. And because
even if it was, it becomes less and less of a problem with
bad drives going away slowly...

b) to use a mechanism like blacklist in kernel to force
invalidation on CLOSE automatically for such drives (not
when it really necessary as my program detects - on REMOVAL).
Less efficient than my solution, but much easier to deal
with in kernel.

c) I will use my variant for my problem.. while finding a
replacement for the bad hardware.

So no, I'm not asking to put that proggy into the kernel.. ;)
For kernelspace solution that'd be a much simple way. If at
all.

So to summary: if it is EASY (read: trivial) to do such blacklist
in kernel space, I'd do it right away, because potentially it
is still possible to see similar corruptions elsewhere. If not,
just forget the case as "solved for the reporter" ;)

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/