Re: Strange block/scsi/workqueue issue

From: James Bottomley
Date: Tue Apr 12 2011 - 11:15:25 EST


On Tue, 2011-04-12 at 14:15 +0900, Tejun Heo wrote:
> > A fix might be to shunt more stuff off to workqueues, but that's
> > producing a more complex system which would be prone to entanglements
> > that would be even harder to spot.
>
> I don't agree there. To me, the cause for entanglement seems to be
> request_fn calling all the way through blocking destruction because it
> detected that the final put was called with sleepable context. It's
> just weird and difficult to anticipate to directly call into sleepable
> destruction path from request_fn whether it had sleepable context or
> not. With the yet-to-be-debugged bug caused by the conversion aside,
> I think simply using workqueue is the better solution.

So your idea is that all final puts should go through a workqueue? Like
I said, that would work, but it's not just SCSI ... any call path that
destroys a queue has to be audited.

The problem is nothing to do with sleeping context ... it's that any
work called by the block workqueue can't destroy that queue. In a
refcounted model, that's a bit nasty.

> > Perhaps a better solution is just not to use sync cancellations in
> > block? As long as the work in the queue holds a queue ref, they can be
> > done asynchronously.
>
> Hmmm... maybe but at least I prefer doing explicit shutdown/draining
> on destruction even if the base data structure is refcounted. Things
> become much more predictable that way.

It is pretty much instantaneous. Unless we're executing, we cancel the
work. If the work is already running, we just let it complete instead
of waiting for it.

Synchronous waits are dangerous because they cause entanglement.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/