Re: [PATCH 0/5] mm: poison critical mm/ structs

From: Hugh Dickins
Date: Thu Oct 02 2014 - 05:25:01 EST


On Wed, 1 Oct 2014, Sasha Levin wrote:
> On 10/01/2014 05:07 PM, Andrew Morton wrote:
> > On Mon, 29 Sep 2014 21:47:14 -0400 Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
> >
> >> Currently we're seeing a few issues which are unexplainable by looking at the
> >> data we see and are most likely caused by a memory corruption caused
> >> elsewhere.
> >>
> >> This is wasting time for folks who are trying to figure out an issue provided
> >> a stack trace that can't really point out the real issue.
> >>
> >> This patch introduces poisoning on struct page, vm_area_struct, and mm_struct,
> >> and places checks in busy paths to catch corruption early.
> >>
> >> This series was tested, and it detects corruption in vm_area_struct. Right now
> >> I'm working on figuring out the source of the corruption, (which is a long
> >> standing bug) using KASan, but the current code is useful as it is.
> >
> > Is this still useful if/when kasan is in place?
>
> Yes, the corruption we're seeing happens inside the struct rather than around it.
> kasan doesn't look there.
>
> When kasan is merged, we could complement this patchset by making kasan trap on
> when the poison is getting written, rather than triggering a BUG in some place
> else after we saw the corruption.
>
> > It looks fairly cheap - I wonder if it should simply fall under
> > CONFIG_DEBUG_VM rather than the new CONFIG_DEBUG_VM_POISON.
>
> Config options are cheap as well :)
>
> I'd rather expand it further and add poison/kasan trapping into other places such
> as the vma interval tree rather than having to keep it "cheap".

I like to run with CONFIG_DEBUG_VM, and would not want this stuff
turned on in my builds (especially not the struct page enlargement);
so I'm certainly with you in preferring a separate option.

But it all seems very ad hoc to me. Are people going to be adding
more and more mm structures into it, ad infinitum? And adding
CONFIG_DEBUG_SCHED_POISON one day when someone notices corruption
of a scheduler structure? etc etc.

What does this add on top of slab poisoning? Some checks in some
mm places while the object is active, I guess: why not base those
on slab poisoning? And add them in as appropriate to the problem
at hand, when a problem is seen.

I think these patches are fine for investigating whatever is the
problem currently afflicting you and mm under trinity; but we all
have our temporary debugging patches, I don't think all deserve
preservation in everyone else's kernel, that amounts to far more
clutter than any are worth.

I'm glad to hear they've confirmed some vm_area_struct corruption:
any ideas on where that's coming from?

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/