Re: ext2 filesystem corruption?!?!??

Mark Hemment (markhe@nextd.demon.co.uk)
Wed, 2 Apr 1997 09:49:57 +0000 (GMT)


Hi Ted/All,

> My patch I believe fixes the problem you found, hopefully
> without introduciig any others. (Unfortunately it appeared not to solve
> the Linux-AFS cache corruption.)

I'm not convinced your patch removes the corruption problem.
Simply locking the inode at the start of clear_inode() does not prevent
another process gaining a referernce to the inode (via __iget() or
get_empty_inode()). This is the race.
Unfortunately, clear_inode() can be called with a locked inode. To make
matters worse, clear_inode() cannot fail! It is called from fs dependent
code after the inode-dependent data has been deconstructed.

The logic behind most of the fs/inode.c code is that when a "maintainance"
operation is performed on an inode it needs locking, _and_ needs a
reference increment for that operation! This is the reason why I saw a
race in sync_inodes(). As far as I can tell, the only other operation
which does not inc the count is the call to write_inode() in
get_empty_inode().

I'll try stress testing an anti-race patch, but the problem only shows up
after several hours on my test-target. Can anyone reproduce inode
corruption faster?

BTW: Does the positioning of the repeat label, in iput(), look wrong?
It's _before_ the i_count dec operation......

Regards,

markhe

------------------------------------------------------------------
Mark Hemment, Unix/C Software Engineer (Contractor)
markhe@nextd.demon.co.uk http://www.demon.co.uk/
"Success has many fathers, failure is a B**TARD!" - anon
------------------------------------------------------------------