Re: [PATCH] kasan: allow sampling page_alloc allocations for HW_TAGS

From: Andrey Konovalov
Date: Thu Oct 27 2022 - 16:57:27 EST


On Thu, Oct 27, 2022 at 10:44 PM Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, 27 Oct 2022 22:10:09 +0200 andrey.konovalov@xxxxxxxxx wrote:
>
> > From: Andrey Konovalov <andreyknvl@xxxxxxxxxx>
> >
> > Add a new boot parameter called kasan.page_alloc.sample, which makes
> > Hardware Tag-Based KASAN tag only every Nth page_alloc allocation.
> >
> > As Hardware Tag-Based KASAN is intended to be used in production, its
> > performance impact is crucial. As page_alloc allocations tend to be big,
> > tagging and checking all such allocations introduces a significant
> > slowdown in some testing scenarios. The new flag allows to alleviate
> > that slowdown.
> >
> > Enabling page_alloc sampling has a downside: KASAN will miss bad accesses
> > to a page_alloc allocation that has not been tagged.
> >
>
> The Documentation:
>
> > --- a/Documentation/dev-tools/kasan.rst
> > +++ b/Documentation/dev-tools/kasan.rst
> > @@ -140,6 +140,10 @@ disabling KASAN altogether or controlling its features:
> > - ``kasan.vmalloc=off`` or ``=on`` disables or enables tagging of vmalloc
> > allocations (default: ``on``).
> >
> > +- ``kasan.page_alloc.sample=<sampling frequency>`` makes KASAN tag only
> > + every Nth page_alloc allocation, where N is the value of the parameter
> > + (default: ``1``).
> > +
>
> explains what this does but not why it does it.
>
> Let's tell people that this is here to mitigate the performance overhead.
>
> And how is this performance impact observed? The kernel just gets
> overall slower?
>
> If someone gets a KASAN report using this mitigation, should their next
> step be to set kasan.page_alloc.sample back to 1 and rerun, in order to
> get a more accurate report before reporting it upstream? I'm thinking
> "no"?
>
> Finally, it would be helpful if the changelog were to give us some
> sense of the magnitude of the impact with kasan.page_alloc.sample=1.
> Does the kernel get 3x slower? 50x?

Hi Andrew,

I will add explanations for all these points in v2.

Thank you!