Re: blk_stop_queue/blk_start_queue confusion, problem, or bug???

From: Lou Langholtz (ldl@aros.net)
Date: Mon Jul 28 2003 - 02:51:05 EST


Jens Axboe wrote:

>On Sun, Jul 27 2003, Lou Langholtz wrote:
>
>
>>I've been trying to use the blk_start_queue and blk_stop_queue functions
>>in the network block device driver branch I'm working on. The stop works
>>as expected, but the start doesn't. Processes that have tried to read or
>>write to the device (after the queue was stopped) stay blocked in
>>io_schedule instead of getting woken up (after blk_start_queue was
>>called). Do I need to follow the call to blk_start_queue() with a call
>>to wake_up() on the correct wait queues? Why not have that functionality
>>be part of blk_start_queue()? Or was this an oversight/bug?
>>
>>
>
>blk_start_queue() should be enough. What kind of behaviour are you
>seeing? Is the request_fn() never called again?
>
Sorry. I've been so burried in this problem, I forgot others probably
can't read my mind ;-) The behavior I was seeing was that processes
blocked on I/O and in io_schedule, don't get woken up. After tracking
the problem down, I realized that once the queue was stopped (using
blk_stop_queue) any I/O requests against an empty request queue would
plug the device. After the short timeout, generic_unplug would get
called and would first try removing the plug then if it succeeded check
QUEUE_FLAG_STOPPED. In my case QUEUE_FLAG_STOPPED hadn't gotten cleared
by the time generic_unplug had gotten invoked. So the queue was left in
a state where it wasn't plugged any more but the request_fn wasn't
running either and things hung that way (locked in io_schedule).
Hopefully the patch I just sent out will make sense if my explanation
doesn't again this time. ;-)

>>The reason I'm using blk_stop_queue and blk_start_queue is to stop the
>>request handling function (installed from blk_init_queue), from being
>>re-invoked and to return when the network block device server goes down.
>>That way, the driver doesn't need to block indefinately within the
>>request handling function - which seems like it'd likely block other
>>block drivers if it did this - and doesn't need to be handled by
>>
>>
>
>It will, you should never block in your request function/
>
>
With the network block device driver, the only way to ensure the request
function *never* blocks is to have a seperate dedicated kernel thread
handling the actual network I/O. At best otherwise, I can use
MSG_DONTWAIT coupled with the blk_start_queue and blk_stop_queue
functions however the code must still drop the spin lock to make the
socket calls (since they still may sleep). At least when I try to call
sock_sendmsg/sendpage with the spin lock still held (and I'm using
CONFIG_DEBUG_SPINLOCK_SLEEP) I get "sleeping function called from
illegal context" messages. Is there another way? What's the way you
would suggest?

>>. . . BTW: LKML has had a related thread on this some years ago in discussing
>>how the block layer system handles request functions that must drop the
>>spinlock and may block indefinately. That never seemed to get resolved
>>though and makes me believe that's why Steven Whitehouse opted to use a
>>multi-threaded approach to the NBD driver at one point.
>>
>>
>
>That has never really been allowed, in that it is a Bad Thing to do
>something like that.
>
>
Want to make sure I don't misunderstand... you mean that dropping the
queue spin lock is a Bad Thing correct? Is it bad enough to warrant
using a seperate kernel thread for handling network sends to avoid this
then? This would have to be a seperate thread per network block device
then to ensure the devices don't impede each other.

Thanks!!!!!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jul 31 2003 - 22:00:34 EST