Re: showstopper race condition in sync() ??? [2.1.119|120]

Bill Hawes (whawes@star.net)
Wed, 16 Sep 1998 10:36:07 -0400


Cyrille Chepelov wrote:

> Sorry, it does uses rename (see the attached strace). But the problem is
> not here : it is just after the rename() call, during the sync() call.
> The attached trace is from a correctly working machine : the sync() call
> returned and the program went on. On the faulting machine, the last line
> of the strace file is :
> """
> sync(
> """
> Now, I got the exactly same problem with quotacheck(8) : failure to
> return from sync(), sometimes, not always, not on every machine. Obviously
> passwd(1) is not in fault (the only userland component I'd suspect would
> be the (g)libc, but that would really be the last component).

>From a quick look at the dquot.c code, there appear to be some problems
with dquot_sync, and possibly other operations. The dquot_sync code does
two potentially blocking operations, a wait_on_quota and the
write_quota, but it doesn't hold a use count on the list node, and
assumes that the list successor will be the same after the operation.

Also, clear_dquot appears to assume that a use count > 1 means that the
item is on the inuse list, so just protecting the above operations by
bumping the use count may have side effects ...

I'm not sure who looks after dquota these days, but if nobody comes
forward to work on it, I'll take a try at fixing it. (But others will
have to test, as I'm not using quota ...)

Regards,
Bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/