Re: Linux 2.6.26-rc4

From: Jeff Moyer
Date: Tue Jun 03 2008 - 13:47:40 EST


Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Wed, Jun 04, 2008 at 01:13:08AM +0800, Ian Kent wrote:
>
>> "What happens is that during an expire the situation can arise
>> that a directory is removed and another lookup is done before
>> the expire issues a completion status to the kernel module.
>> In this case, since the the lookup gets a new dentry, it doesn't
>> know that there is an expire in progress and when it posts its
>> mount request, matches the existing expire request and waits
>> for its completion. ENOENT is then returned to user space
>> from lookup (as the dentry passed in is now unhashed) without
>> having performed the mount request.
>>
>> The solution used here is to keep track of dentrys in this
>> unhashed state and reuse them, if possible, in order to
>> preserve the flags. Additionally, this infrastructure will
>> provide the framework for the reintroduction of caching
>> of mount fails removed earlier in development."
>>
>> I wasn't able to do an acceptable re-implementation of the negative
>> caching we had in 2.4 with this framework, so just ignore the last
>> sentence in the above description.
>
>> Unfortunately no, but I thought that once the dentry became unhashed
>> (aka ->rmdir() or ->unlink()) it was invisible to the dcache. But, of
>> course there may be descriptors open on the dentry, which I think is the
>> problem that's being pointed out.
>
> ... or we could have had a pending mount(2) sitting there with a reference
> to mountpoint-to-be...
>
>> Yes, that would be ideal but the reason we arrived here is that, because
>> we must release the directory mutex before calling back to the daemon
>> (the heart of the problem, actually having to drop the mutex) to perform
>> the mount, we can get a deadlock. The cause of the problem was that for
>> "create" like operations the mutex is held for ->lookup() and
>> ->revalidate() but for a "path walks" the mutex is only held for
>> ->lookup(), so if the mutex is held when we're in ->revalidate(), we
>> could never be sure that we where the code path that acquired it.
>>
>> Sorry, this last bit is unclear.
>> I'll need to work a bit harder on the explanation if you're interested
>> in checking further.
>
> I am.

commit 1864f7bd58351732593def024e73eca1f75bc352
Author: Ian Kent <raven@xxxxxxxxxx>
Date: Wed Aug 22 14:01:54 2007 -0700

autofs4: deadlock during create

Due to inconsistent locking in the VFS between calls to lookup and
revalidate deadlock can occur in the automounter.

The inconsistency is that the directory inode mutex is held for both lookup
and revalidate calls when called via lookup_hash whereas it is held only
for lookup during a path walk. Consequently, if the mutex is held during a
call to revalidate autofs4 can't release the mutex to callback the daemon
as it can't know whether it owns the mutex.

This situation happens when a process tries to create a directory within an
automount and a second process also tries to create the same directory
between the lookup and the mkdir. Since the first process has dropped the
mutex for the daemon callback, the second process takes it during
revalidate leading to deadlock between the autofs daemon and the second
process when the daemon tries to create the mount point directory.

After spending quite a bit of time trying to resolve this on more than one
occassion, using rather complex and ulgy approaches, it turns out that just
delaying the hashing of the dentry until the create operation works fine.

> Oh, well... Looks like RTFS time for me for now... Additional parts of
> braindump would be appreciated - the last time I've seriously looked at
> autofs4 internal had been ~2005 or so ;-/

Well, let me know what level of dump you'd like. I can give the 50,000
foot view, or I can give you the history of things that happened to get
us to where we are today, or anything inbetween. The more specific
your request, the quicker I can respond. A full brain-dump would take
some time!

Cheers,

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/