Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

From: Linus Torvalds
Date: Wed Aug 14 2013 - 14:35:30 EST


On Wed, Aug 14, 2013 at 11:28 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
>
> OK that would suggest the issue has been introduced by 597e1c35:
> (mm/mmu_gather: enable tlb flush range in generic mmu_gather) in 3.6
> which is not 3.7 when Ben started seeing the issue but this definitely
> smells like a bug that would be amplified by the bisected patch.

Yes, the bug was originally introduced in 597e1c35, but in practice it
never happened, because the force_flush case would not ever really
trigger unless __get_free_pages(GFP_NOWAIT) returned NULL.

Which is *very* rare.

So the commit that Ben bisected things down to wasn't the one that
really introduced the bug, but it was the one that made
tlb_next_batch() much more likely to return failure, which in turn
made it much easier to *expose* the bug.

NOTE! I still absolutely want Ben to actually test that fix (ie
backport commit e6c495a96ce0 to his tree), because without testing
this is all just theoretical, and there might be other things hiding
here. But it makes sense to me, and I think this already-known bug
explains the symptoms.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/