Re: Mysterious CFQ crash and RCU

From: Paul Bolle
Date: Sun Jun 05 2011 - 04:40:04 EST


On Sun, 2011-06-05 at 08:56 +0200, Jens Axboe wrote:
> Does this fix it? It will introduce a hierarchy that is queue -> ioc
> lock, but as far as I can remember (and tell from a quick look), we
> don't have any dependencies on that order of locking at this moment. So
> should be OK.
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 3c7b537..fa7ef54 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -2772,8 +2772,11 @@ static void __cfq_exit_single_io_context(struct cfq_data *cfqd,
> smp_wmb();
> cic->key = cfqd_dead_key(cfqd);
>
> - if (ioc->ioc_data == cic)
> + if (ioc->ioc_data == cic) {
> + spin_lock(&ioc->lock);
> rcu_assign_pointer(ioc->ioc_data, NULL);
> + spin_unlock(&ioc->lock);
> + }
>
> if (cic->cfqq[BLK_RW_ASYNC]) {
> cfq_exit_cfqq(cfqd, cic->cfqq[BLK_RW_ASYNC]);
>

0) I'd guess not, as the last thing I tried before simply ripping
io_context.ioc_data out, was:

spin_lock_irqsave(&ioc->lock, flags);
rcu_read_lock();
ioc_data = rcu_dereference(ioc->ioc_data);
rcu_read_unlock();
if (ioc_data == cic)
rcu_assign_pointer(ioc->ioc_data, NULL);
spin_unlock_irqrestore(&ioc->lock, flags);

(By this time I had already wrapped all access to io_context.ioc_data in
rcu_read_lock(), rcu_dereference(), and rcu_read_unlock() voodoo. I also
wrapped all access of io_context members - other than refcount and
nr_tasks - in a spin_lock_irqsave()/spin_unlock_irqrestore() on
io_context.lock. This gave no warnings, nor lockups, but the code just
kept crashing in the exact same location it always did!)

1) Of course, by now I have forgotten what I had in mind when I stopped
working on this last night. My first bet currently is to change the core
of cic_free_func() into something like:

spin_lock_irqsave(&ioc->lock, flags);
radix_tree_delete(&ioc->radix_root, dead_key >>
CIC_DEAD_INDEX_SHIFT);

rcu_read_lock();
ioc_data = rcu_dereference(ioc->ioc_data);
rcu_read_unlock();
if (ioc_data == cic)
rcu_assign_pointer(ioc->ioc_data, NULL);

hlist_del_rcu(&cic->cic_list);
spin_unlock_irqrestore(&ioc->lock, flags);

2) But, I must admit I'm not yet at full speed today.


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/