On Tue 29-07-25 11:19:05, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>
Currently elevators will record internal 'async_depth' to throttle
asynchronous requests, and they both calculate shallow_dpeth based on
sb->shift, with the respect that sb->shift is the available tags in one
word.
However, sb->shift is not the availbale tags in the last word, see
__map_depth:
if (index == sb->map_nr - 1)
return sb->depth - (index << sb->shift);
For consequence, if the last word is used, more tags can be get than
expected, for example, assume nr_requests=256 and there are four words,
in the worst case if user set nr_requests=32, then the first word is
the last word, and still use bits per word, which is 64, to calculate
async_depth is wrong.
One the other hand, due to cgroup qos, bfq can allow only one request
to be allocated, and set shallow_dpeth=1 will still allow the number
of words request to be allocated.
Fix this problems by using shallow_depth to the whole sbitmap instead
of per word, also change kyber, mq-deadline and bfq to follow this.
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
I agree with these problems but AFAIU this implementation of shallow depth
has been done for a reason. Omar can chime in here as the original author
or perhaps Jens but the idea of current shallow depth implementation is
that each sbitmap user regardless of used shallow depth has a chance to
allocate from each sbitmap word which evenly distributes pressure among
available sbitmap words. With the implementation you've chosen there will
be higher pressure (and thus contention) on words with low indices.
So I think we would be good to fix issues with shallow depth for small
number of sbitmap words (because that's where these buggy cornercases may
matter in practice) but I believe the logic which constrains number of used
bits from each *word* when shallow_depth is specified should be kept. It
might make sense to change the API so that shallow_depth is indeed
specified compared to the total size of the bitmap, not to the size of the
word (because that's confusing practically everybody I've met and is a
constant source of bugs) if it can be made to perform well.
Honza