Re: [PATCH] block: don't make BLK_DEF_MAX_SECTORS too big

From: Ming Lei
Date: Wed Mar 30 2016 - 20:49:49 EST


Hi Shaohua,

On Thu, Mar 31, 2016 at 1:07 AM, Shaohua Li <shli@xxxxxx> wrote:
> On Wed, Mar 30, 2016 at 08:13:07PM +0800, Ming Lei wrote:
>> Hi Shaohua,
>>
>> On Wed, Mar 30, 2016 at 10:27 AM, Shaohua Li <shli@xxxxxx> wrote:
>> > On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote:
>> >> On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li <shli@xxxxxx> wrote:
>> >> > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
>> >> > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
>> >> > BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
>> >> > fail.
>> >> >
>> >> > In the future, we can extend the size either bvec_slabs array is
>> >> > expanded or the upcoming multipage bvec is added if pages are
>> >> > contiguous. This one is suitable for stable.
>> >> >
>> >> > Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560)
>> >> > Reported-by: Sebastian Roesner <sroesner-kernelorg@xxxxxxxxxxxxxxxxx>
>> >> > Cc: stable@xxxxxxxxxxxxxxx (4.2+)
>> >> > Cc: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>> >> > Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
>> >> > Signed-off-by: Shaohua Li <shli@xxxxxx>
>> >> > ---
>> >> > include/linux/blkdev.h | 6 +++++-
>> >> > 1 file changed, 5 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> >> > index 7e5d7e0..da64325 100644
>> >> > --- a/include/linux/blkdev.h
>> >> > +++ b/include/linux/blkdev.h
>> >> > @@ -1153,7 +1153,11 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
>> >> > enum blk_default_limits {
>> >> > BLK_MAX_SEGMENTS = 128,
>> >> > BLK_SAFE_MAX_SECTORS = 255,
>> >> > - BLK_DEF_MAX_SECTORS = 2560,
>> >> > + /*
>> >> > + * if you change this, please also change bvec_alloc and BIO_MAX_PAGES.
>> >> > + * Otherwise bio_alloc_bioset will break.
>> >> > + */
>> >> > + BLK_DEF_MAX_SECTORS = BIO_MAX_SECTORS,
>> >>
>> >> Thinking about it further, it isn't good to change the default max
>> >> sectors because
>> >> the patch affects REQ_PC bios too, which don't have the 1Mbytes limit at all.
>> >
>> > what breaks setting REQ_PC to 1M limit? I can understand bigger limit might help
>> > big raid array performance, but REQ_PC isn't the case.
>>
>> I mean REQ_PC can include at most 1024 vectors intead of 256, so looks it isn't
>> fair to introduce the strict limit for all kinds of requests.
>>
>> More importantly, the max sector limit is for limitting max sectors in
>> a request,
>> and is used for bios merging, not same with bio's 256 bvecs limit.
>
> My point is this doesn't matter because there is no performance issue. 2560
> isn't fair too which uses 320 vectors. And note,
> blk_queue_max_hw_sectors doesn't force max_hw_sectors has the
> BLK_DEF_MAX_SECTORS limit.

OK

>> >
>> >> So suggest to just change bcache's queue max sector limit to 1M, that means
>> >> we shouldn't encourage bcache's usage of bypassing bio_add_page().
>> >
>> > Don't think this is a good idea. This is a limitation of block core,
>>
>> This bio's 256 bvecs limitation is from block implementation, think about why
>> one bvec just includes one page, instead of one segment. In the future, it can
>> be improved absolutely, that is why I said it isn't good to use BIO_MAX_SECTORS.
>> Also you can find that there is only one user of BIO_MAX_SECTORS.
>
> Don't disagree. But when you switch to multpage bvec, you must fix this
> anyway, let's fix current problem. Both 1M or 2560 sectors are wrong in
> that case. The size limit could be 1M if pages are not contiguous or 256
> * max_segment_size.
>
>> > block core should make sure the limitation doesn't break, not the
>> > driver. On the other hand, are you going to fix all drivers? drivers can
>> > set arbitrary max sector limit.
>>
>> The issue only exists if drivers(fs, dm, md, bcache) do not use bio_add_page().
>> All this kind of usage shouldn't be encouraged.
>
> bio_add_page can add pages to big bio too, there is no limitation.

Yeah, it is possible, but unusual.

>> So how about fixing the issue by introducing the limit into get_max_io_size()?
>> Such as, add something like below at the end of this function?
>>
>> sectors = min_t(unsigned, sectors, BIO_MAX_PAGES <<
>> (PAGE_CACHE_SHIFT - 9));
>
> I can do this, just don't see the point why. max_sectors is a software
> limitation.

Let me make it clear. blk_rq_get_max_sectors() is used in for merging
bios/reqs, and that means limits.max_sectors is for limitting max sectors
in one request or transfer. Now this patch decreases it just for single bio's
256 bvec's limitation. Is it correct? That is the reason why I suggest to
change get_max_io_size() for bio's 256 bvecs limit.

On the contrary, the default max sectors should have been increased
since hardware is becoming quicker, and we should send more to drive
in one request, IMO.

Thanks,
Ming