Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.

From: Jianpeng Ma
Date: Tue Oct 16 2012 - 21:04:46 EST


On 2012-10-16 15:48 Shaohua Li <shli@xxxxxxxxxx> Wrote:
>2012/10/16 Jianpeng Ma <majianpeng@xxxxxxxxx>:
>> On 2012-10-15 21:18 Shaohua Li <shli@xxxxxxxxxx> Wrote:
>>>2012/10/15 Shaohua Li <shli@xxxxxxxxxxxx>:
>>>> 2012/10/15 Jianpeng Ma <majianpeng@xxxxxxxxx>:
>>>>> My workload is a raid5 which had 16 disks. And used our filesystem to
>>>>> write using direct-io mode.
>>>>> I used the blktrace to find those message:
>>>>>
>>>>> 8,16 0 3570 1.083923979 2519 I W 144323176 + 24 [md127_raid5]
>>>>> 8,16 0 0 1.083926214 0 m N cfq2519 insert_request
>>>>> 8,16 0 3571 1.083926586 2519 I W 144323072 + 104 [md127_raid5]
>>>>> 8,16 0 0 1.083926952 0 m N cfq2519 insert_request
>>>>> 8,16 0 3572 1.083927180 2519 U N [md127_raid5] 2
>>>>> 8,16 0 0 1.083927870 0 m N cfq2519 Not idling.st->count:1
>>>>> 8,16 0 0 1.083928320 0 m N cfq2519 dispatch_insert
>>>>> 8,16 0 0 1.083928951 0 m N cfq2519 dispatched a request
>>>>> 8,16 0 0 1.083929443 0 m N cfq2519 activate rq,drv=1
>>>>> 8,16 0 3573 1.083929530 2519 D W 144323176 + 24 [md127_raid5]
>>>>> 8,16 0 0 1.083933883 0 m N cfq2519 Not idling.st->count:1
>>>>> 8,16 0 0 1.083934189 0 m N cfq2519 dispatch_insert
>>>>> 8,16 0 0 1.083934654 0 m N cfq2519 dispatched a request
>>>>> 8,16 0 0 1.083935014 0 m N cfq2519 activate rq,drv=2
>>>>> 8,16 0 3574 1.083935101 2519 D W 144323072 + 104 [md127_raid5]
>>>>> 8,16 0 3575 1.084196179 0 C W 144323176 + 24 [0]
>>>>> 8,16 0 0 1.084197979 0 m N cfq2519 complete rqnoidle 0
>>>>> 8,16 0 3576 1.084769073 0 C W 144323072 + 104 [0]
>>>>> ......
>>>>> 8,16 1 3596 1.091394357 2519 I W 144322544 + 16 [md127_raid5]
>>>>> 8,16 1 0 1.091396181 0 m N cfq2519 insert_request
>>>>> 8,16 1 3597 1.091396571 2519 I W 144322520 + 24 [md127_raid5]
>>>>> 8,16 1 0 1.091396934 0 m N cfq2519 insert_request
>>>>> 8,16 1 3598 1.091397165 2519 I W 144322488 + 32 [md127_raid5]
>>>>> 8,16 1 0 1.091397477 0 m N cfq2519 insert_request
>>>>> 8,16 1 3599 1.091397708 2519 I W 144322432 + 56 [md127_raid5]
>>>>> 8,16 1 0 1.091398023 0 m N cfq2519 insert_request
>>>>> 8,16 1 3600 1.091398284 2519 U N [md127_raid5] 4
>>>>> 8,16 1 0 1.091398986 0 m N cfq2519 Not idling. st->count:1
>>>>> 8,16 1 0 1.091399511 0 m N cfq2519 dispatch_insert
>>>>> 8,16 1 0 1.091400217 0 m N cfq2519 dispatched a request
>>>>> 8,16 1 0 1.091400688 0 m N cfq2519 activate rq,drv=1
>>>>> 8,16 1 3601 1.091400766 2519 D W 144322544 + 16 [md127_raid5]
>>>>> 8,16 1 0 1.091406151 0 m N cfq2519 Not idling.st->count:1
>>>>> 8,16 1 0 1.091406460 0 m N cfq2519 dispatch_insert
>>>>> 8,16 1 0 1.091406931 0 m N cfq2519 dispatched a request
>>>>> 8,16 1 0 1.091407291 0 m N cfq2519 activate rq,drv=2
>>>>> 8,16 1 3602 1.091407378 2519 D W 144322520 + 24 [md127_raid5]
>>>>> 8,16 1 0 1.091414006 0 m N cfq2519 Not idling.st->count:1
>>>>> 8,16 1 0 1.091414297 0 m N cfq2519 dispatch_insert
>>>>> 8,16 1 0 1.091414702 0 m N cfq2519 dispatched a request
>>>>> 8,16 1 0 1.091415047 0 m N cfq2519 activate rq, drv=3
>>>>> 8,16 1 3603 1.091415125 2519 D W 144322488 + 32 [md127_raid5]
>>>>> 8,16 1 0 1.091416469 0 m N cfq2519 Not idling.st->count:1
>>>>> 8,16 1 0 1.091416754 0 m N cfq2519 dispatch_insert
>>>>> 8,16 1 0 1.091417186 0 m N cfq2519 dispatched a request
>>>>> 8,16 1 0 1.091417535 0 m N cfq2519 activate rq,drv=4
>>>>> 8,16 1 3604 1.091417628 2519 D W 144322432 + 56 [md127_raid5]
>>>>> 8,16 1 3605 1.091857225 4393 C W 144322544 + 16 [0]
>>>>> 8,16 1 0 1.091858753 0 m N cfq2519 complete rqnoidle 0
>>>>> 8,16 1 3606 1.092068456 4393 C W 144322520 + 24 [0]
>>>>> 8,16 1 0 1.092069851 0 m N cfq2519 complete rqnoidle 0
>>>>> 8,16 1 3607 1.092350440 4393 C W 144322488 + 32 [0]
>>>>> 8,16 1 0 1.092351688 0 m N cfq2519 complete rqnoidle 0
>>>>> 8,16 1 3608 1.093629323 0 C W 144322432 + 56 [0]
>>>>> 8,16 1 0 1.093631151 0 m N cfq2519 complete rqnoidle 0
>>>>> 8,16 1 0 1.093631574 0 m N cfq2519 will busy wait
>>>>> 8,16 1 0 1.093631829 0 m N cfq schedule dispatch
>>>>>
>>>>> Because in func "elv_attempt_insert_merge", it only to try to
>>>>> backmerge.So the four request can't merge in theory.
>>>>> I trace ten minutes and count those situation, it can count 25%.
>>>>>
>>>>> With the patch,i tested and not found situation like above.
>>>>>
>>>>> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
>>>>> ---
>>>>> block/blk-core.c | 3 ++-
>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>> index a33870b..3c95c4d 100644
>>>>> --- a/block/blk-core.c
>>>>> +++ b/block/blk-core.c
>>>>> @@ -2868,7 +2868,8 @@ static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b)
>>>>> struct request *rqa = container_of(a, struct request, queuelist);
>>>>> struct request *rqb = container_of(b, struct request, queuelist);
>>>>>
>>>>> - return !(rqa->q <= rqb->q);
>>>>> + return !(rqa->q < rqb->q ||
>>>>> + (rqa->q == rqb->q && blk_rq_pos(rqa) < blk_rq_pos(rqb)));
>>>>> }
>>>>
>>>> Does this one help too?
>>>> http://marc.info/?l=linux-kernel&m=132399972114668&w=2
>>>>
>>>> I thought the real problem is we don't do recursive request
>>>> merge. I had no objection to the patch itself, but just hope we
>>>> can make recursive merge work, which is more generic.
>>>
>>>Oh, wait, the 4 requests aren't merged completely in your case.
>>>And the requests are from one thread and plug context.
>>>Not the issue I mentioned. I'm wondering how this could happen.
>>>they should be merged in attempt_plug_merge already.
>> Hi shaohua,
>> I think i missed some messages about blktrace.So make you confused.
>> I trace some message using blktrace:
>> 8,16 0 6638 2.453619407 2579 Q W 7493144 + 8 [md0_raid5]
>> 8,16 0 6639 2.453620460 2579 G W 7493144 + 8 [md0_raid5]
>> 8,16 0 6640 2.453639311 2579 Q W 7493120 + 8 [md0_raid5]
>> 8,16 0 6641 2.453639842 2579 G W 7493120 + 8 [md0_raid5]
>> 8,16 0 6642 2.453647796 2579 Q W 7493128 + 8 [md0_raid5]
>> 8,16 0 6643 2.453647940 2579 M W 7493128 + 8 [md0_raid5]
>> 8,16 0 6644 2.453658249 2579 Q W 7493136 + 8 [md0_raid5]
>> 8,16 0 6645 2.453658393 2579 M W 7493136 + 8 [md0_raid5]
>> 8,16 0 6646 2.453665309 2579 Q W 7493152 + 8 [md0_raid5]
>> 8,16 0 6647 2.453665504 2579 M W 7493152 + 8 [md0_raid5]
>> 8,16 0 6648 2.453672411 2579 Q W 7493160 + 8 [md0_raid5]
>> 8,16 0 6649 2.453672606 2579 M W 7493160 + 8 [md0_raid5]
>> 8,16 0 6650 2.453679255 2579 Q W 7493168 + 8 [md0_raid5]
>> 8,16 0 6651 2.453679441 2579 M W 7493168 + 8 [md0_raid5]
>> 8,16 0 6652 2.453685948 2579 Q W 7493176 + 8 [md0_raid5]
>> 8,16 0 6653 2.453686149 2579 M W 7493176 + 8 [md0_raid5]
>> 8,16 0 6654 2.453693074 2579 Q W 7493184 + 8 [md0_raid5]
>> 8,16 0 6655 2.453693254 2579 M W 7493184 + 8 [md0_raid5]
>> 8,16 0 6656 2.453704290 2579 Q W 7493192 + 8 [md0_raid5]
>> 8,16 0 6657 2.453704482 2579 M W 7493192 + 8 [md0_raid5]
>> 8,16 0 6658 2.453715016 2579 Q W 7493200 + 8 [md0_raid5]
>> 8,16 0 6659 2.453715247 2579 M W 7493200 + 8 [md0_raid5]
>> 8,16 0 6660 2.453721730 2579 Q W 7493208 + 8 [md0_raid5]
>> 8,16 0 6661 2.453721974 2579 M W 7493208 + 8 [md0_raid5]
>> 8,16 0 6662 2.453728202 2579 Q W 7493216 + 8 [md0_raid5]
>> 8,16 0 6663 2.453728436 2579 M W 7493216 + 8 [md0_raid5]
>> 8,16 0 6664 2.453734782 2579 Q W 7493224 + 8 [md0_raid5]
>> 8,16 0 6665 2.453735019 2579 M W 7493224 + 8 [md0_raid5]
>> 8,16 0 6666 2.453741401 2579 Q W 7493232 + 8 [md0_raid5]
>> 8,16 0 6667 2.453741632 2579 M W 7493232 + 8 [md0_raid5]
>> 8,16 0 6668 2.453748148 2579 Q W 7493240 + 8 [md0_raid5]
>> 8,16 0 6669 2.453748386 2579 M W 7493240 + 8 [md0_raid5]
>> 8,16 0 6670 2.453851843 2579 I W 7493144 + 104 [md0_raid5]
>> 8,16 0 0 2.453853661 0 m N cfq2579 insert_request
>> 8,16 0 6671 2.453854064 2579 I W 7493120 + 24 [md0_raid5]
>> 8,16 0 0 2.453854439 0 m N cfq2579 insert_request
>> 8,16 0 6672 2.453854793 2579 U N [md0_raid5] 2
>> 8,16 0 0 2.453855513 0 m N cfq2579 Not idling. st->count:1
>> 8,16 0 0 2.453855927 0 m N cfq2579 dispatch_insert
>> 8,16 0 0 2.453861771 0 m N cfq2579 dispatched a request
>> 8,16 0 0 2.453862248 0 m N cfq2579 activate rq, drv=1
>> 8,16 0 6673 2.453862332 2579 D W 7493120 + 24 [md0_raid5]
>> 8,16 0 0 2.453865957 0 m N cfq2579 Not idling. st->count:1
>> 8,16 0 0 2.453866269 0 m N cfq2579 dispatch_insert
>> 8,16 0 0 2.453866707 0 m N cfq2579 dispatched a request
>> 8,16 0 0 2.453867061 0 m N cfq2579 activate rq, drv=2
>> 8,16 0 6674 2.453867145 2579 D W 7493144 + 104 [md0_raid5]
>> 8,16 0 6675 2.454147608 0 C W 7493120 + 24 [0]
>> 8,16 0 0 2.454149357 0 m N cfq2579 complete rqnoidle 0
>> 8,16 0 6676 2.454791505 0 C W 7493144 + 104 [0]
>> 8,16 0 0 2.454794803 0 m N cfq2579 complete rqnoidle 0
>> 8,16 0 0 2.454795160 0 m N cfq schedule dispatch
>>
>> From above messages,we can found why rq[W 7493144 + 104] and rq[W 7493120 + 24] do not merge.
>> Because the bio order is:
>> 8,16 0 6638 2.453619407 2579 Q W 7493144 + 8 [md0_raid5]
>> 8,16 0 6639 2.453620460 2579 G W 7493144 + 8 [md0_raid5]
>> 8,16 0 6640 2.453639311 2579 Q W 7493120 + 8 [md0_raid5]
>> 8,16 0 6641 2.453639842 2579 G W 7493120 + 8 [md0_raid5]
>> Because the bio(7493144) first and bio(7493120) later.So the subsequent bios will be divided into two parts.
>> As you mentions, recursive merge doesn't work for this situation.
>> Why is can occur?I think because the raid operation.It disorder some bios.
>>
>> Is it ok? If ok,i'll resend the patch using useful message.
>
>So the real problem is attempt_plug_merge can't do recursive merge
>for merged request, because there is no elevator info attached to the
>request, so we can only do the merge in elv_attempt_insert_merge().
>
>I'm still hoping elv_attempt_insert_merge() can do frontmerge and
>recursive merge, which is more generic and can solve more
>problems than your patch. But giving the simplicity of your patch,
>maybe we should merge yours at that time.
Thanks your time! I'll resend it.㈤旃??????+-遍荻?w??笔???dz罐??骅w*jg??????/??罐????璀??摺?囤??????:+v???佶>W?贽i?xPj??? -?+?d?