Re: [RFC] situation with fput() locking (was Re: [PULL REQUEST] :ima-appraisal patches)

From: Al Viro
Date: Fri Apr 20 2012 - 12:42:39 EST


On Fri, Apr 20, 2012 at 05:08:48PM +0100, Al Viro wrote:

> Doing removal from per-sb list immediately (i.e. before possible
> deferral; we skip ones with zero ->f_count when we walk the list
> anyway), then in case we decide to defer just move them to per-CPU
> list and schedule work on that CPU, with handler that will pull the
> corresponding list out and do the rest of __fput() for everything
> in that list. No extra locking, just preempt_disable() around the
> "move to per-CPU list" bit. Or a per-CPU spinlock with worker not
> being tied to specific CPU and told which CPU's list to work with.
> How does CPU hotplug interact with work scheduled on CPU about to
> be taken down, BTW?

Actually, I like the per-CPU spinlock variant better; the thing is,
with that scheme we get normal fput() (i.e. non-nodefer variant)
non-blocking. How about this:

__fput() loses file_sb_list_del() call

fput(file)
{
if (atomic_long_dec_and_test(...)) {
unsigned long flags;
struct foo *p;
file_sb_list_del(file);
p = get_cpu_var(deferral_lists);
spin_lock_irqsave(&p->lock, flags);
list_move(&file->f_u.fu_list, &p->list);
spin_unlock_irqrestore(&p->lock, flags);
schedule_work(&p->work);
put_cpu_var(p);
}
}

fput_nodefer(file)
{
if (atomic_long_dec_and_test(...)) {
file_sb_list_del(file);
__fput(file);
}
}

do_deferred_fput_work(work)
{
struct foo *p = container_of(work, struct foo, work);
LIST_HEAD(list);
spin_lock_irq(&p->lock);
list_splice_init(&p->list, list);
spin_unlock_irq(&p->lock);
while (!list_empty(list)) {
struct file *file = list_entry(list, struct file, f_u.fu_list);
list_del_init(&file->f_u.fu_list);
__fput(file);
}
}

Voila - now only fput_nodefer() is blocking! fput() can be used from
any context that way, which should kill e.g. a kludge in fs/aio.c.

Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/