Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

From: Linus Torvalds
Date: Wed Aug 14 2013 - 14:03:39 EST


On Wed, Aug 14, 2013 at 10:40 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
>>
>> After a _very long session of rebooting and bisecting_ the Linux kernel
>> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
>> to the following patch:
>>
>> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
>> 787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable [1]
>> 53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
>
> Thanks for bisecting this up!
>
> I will look into this but I find it really strange.

We had a TLB invalidation bug in the case when we ran out of page
slots (and limiting the mmu_gather batching basically forcesd an early
case of that).

It was fixed in commit e6c495a96ce02574e765d5140039a64c8d4e8c9e ("mm:
fix the TLB range flushed when __tlb_remove_page() runs out of
slots"), and that doesn't seem to have been marked for stable
(probably because the commit message makes everytbody reading it think
it's limited to ARC).

Ben, can you try back-porting that commit from mainline and see if
that fixes things?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/