Re: [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count

From: Aaron Lu
Date: Thu Aug 11 2016 - 22:16:25 EST


On 08/11/2016 11:13 PM, Huang, Ying wrote:
> Aaron Lu <aaron.lu@xxxxxxxxx> writes:
>
>> Since commit 52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by
>> CALL_FUNCTION_VECTOR"), the tlb remote shootdown is done through call
>> function vector. That commit didn't take care of irq_tlb_count so later
>> commit fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from
>> other functions call interrupts") tried to fix it.
>>
>> The fix assumes every increase of irq_tlb_count has a corresponding
>> increase of irq_call_count. So the irq_call_count is always bigger than
>> irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
>>
>> Unfortunately this is not true for the smp_call_function_single case.
>> The IPI is only sent if the target CPU's call_single_queue is empty when
>> adding a csd into it in generic_exec_single. That means if two threads
>> are both adding flush tlb csds to the same CPU's call_single_queue, only
>> one IPI is sent. In other words, the irq_call_count is incremented by 1
>> but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
>> bigger than irq_call_count and the substract will produce a very large
>> irq_call_count value due to overflow.
>>
>> Considering that:
>> 1 it's not worth to send more IPIs for the sake of accurate counting of
>> irq_call_count in generic_exec_single;
>> 2 it's not easy to tell if the call function interrupt is for TLB
>> shootdown in __smp_call_function_single_interrupt.
>> Not to exclude TLB shootdown from call function count seems to be the
>> simplest fix and this patch just did that.
>>
>> This is found by LKP's cyclic performance regression tracking recently
>> with the vm-scalability test suite. I have bisected to commit
>> 0a7ce4b5a632 ("mm/rmap: share the i_mmap_rwsem"). This commit didn't do
>> anything wrong but revealed the irq_call_count problem. IIUC, the commit
>> makes rwc->remap_one in rmap_walk_file concurrent with multiple threads.
>> When remap_one is try_to_unmap_one, then multiple threads could queue
>> flush tlb to the same CPU but only one IPI will be sent.
>>
>> Since the commit enter Linux v3.19, the counting problem only shows up
>> from v3.19. Considering this is a behaviour change, I'm not sure if I
>> should add the stable tag here.
>>
>> Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx>
>
> Thanks for fix. You forget to add :)
>
> Reported-by: "Huang, Ying" <ying.huang@xxxxxxxxx>

Oh right, sorry about that.

Regards,
Aaron