Re: 2.6.20.3 AMD64 oops in CFQ code

From: Neil Brown
Date: Thu Mar 22 2007 - 20:45:37 EST


On Thursday March 22, dan.j.williams@xxxxxxxxx wrote:
>
> Not a cfq failure, but I have been able to reproduce a different oops
> at array stop time while i/o's were pending. I have not dug into it
> enough to suggest a patch, but I wonder if it is somehow related to
> the cfq failure since it involves congestion and drives going away:

Thanks. I know about that one and have a patch about to be posted
which should fix it. But I don't completely understand it.

When a raid5 array shuts down, it clears mddev->private, but doesn't
clean q->backing_dev_info.congested_fn. So if someone tries to call
that congested_fn, it will try to dereference mddev->private and Oops.

Only by the time that raid5 is shutting down, no-one should have a
reference to the device any more, and no-one should be in a position
to call congested_fn !!

Maybe pdflush is just trying to sync the block device, even though
there is no dirty date .... dunno....

But I don't think it is related to the cfq problem as this one is only
a problem when the array is being stopped.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/