Re: Block device throttling [Re: Distributed storage.]

From: Daniel Phillips
Date: Mon Aug 13 2007 - 08:21:23 EST


On Sunday 12 August 2007 22:36, I wrote:
> Note! There are two more issues I forgot to mention earlier.

Oops, and there is also:

3) The bio throttle, which is supposed to prevent deadlock, can itself
deadlock. Let me see if I can remember how it goes.

* generic_make_request puts a bio in flight
* the bio gets past the throttle and initiates network IO
* net calls sk_alloc->alloc_pages->shrink_caches
* shrink_caches submits a bio recursively to our block device
* this bio blocks on the throttle
* net may never get the memory it needs, and we are wedged

I need to review a backtrace to get this precisely right, however you
can see the danger. In ddsnap we kludge around this problem by not
throttling any bio submitted in PF_MEMALLOC mode, which effectively
increases our reserve requirement by the amount of IO that mm will
submit to a given block device before deciding the device is congested
and should be left alone. This works, but is sloppy and disgusting.

The right thing to do is to make sure than the mm knows about our
throttle accounting in backing_dev_info so it will not push IO to our
device when it knows that the IO will just block on congestion.
Instead, shrink_caches will find some other less congested block device
or give up, causing alloc_pages to draw from the memalloc reserve to
satisfy the sk_alloc request.

The mm already uses backing_dev_info this way, we just need to set the
right bits in the backing_dev_info state flags. I think Peter posted a
patch set that included this feature at some point.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/