Re: [PATCH 1/5] include/linux: Add instrumented.h infrastructure

From: Marco Elver
Date: Tue Jan 21 2020 - 04:44:22 EST


On Mon, 20 Jan 2020 at 17:39, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> On Mon, Jan 20, 2020 at 5:25 PM Marco Elver <elver@xxxxxxxxxx> wrote:
> > > > > > > > This adds instrumented.h, which provides generic wrappers for memory
> > > > > > > > access instrumentation that the compiler cannot emit for various
> > > > > > > > sanitizers. Currently this unifies KASAN and KCSAN instrumentation. In
> > > > > > > > future this will also include KMSAN instrumentation.
> > > > > > > >
> > > > > > > > Note that, copy_{to,from}_user require special instrumentation,
> > > > > > > > providing hooks before and after the access, since we may need to know
> > > > > > > > the actual bytes accessed (currently this is relevant for KCSAN, and is
> > > > > > > > also relevant in future for KMSAN).
> > > > > > > >
> > > > > > > > Suggested-by: Arnd Bergmann <arnd@xxxxxxxx>
> > > > > > > > Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > include/linux/instrumented.h | 153 +++++++++++++++++++++++++++++++++++
> > > > > > > > 1 file changed, 153 insertions(+)
> > > > > > > > create mode 100644 include/linux/instrumented.h
> > > > > > > >
> > > > > > > > diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..9f83c8520223
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/include/linux/instrumented.h
> > > > > > > > @@ -0,0 +1,153 @@
> > > > > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * This header provides generic wrappers for memory access instrumentation that
> > > > > > > > + * the compiler cannot emit for: KASAN, KCSAN.
> > > > > > > > + */
> > > > > > > > +#ifndef _LINUX_INSTRUMENTED_H
> > > > > > > > +#define _LINUX_INSTRUMENTED_H
> > > > > > > > +
> > > > > > > > +#include <linux/compiler.h>
> > > > > > > > +#include <linux/kasan-checks.h>
> > > > > > > > +#include <linux/kcsan-checks.h>
> > > > > > > > +#include <linux/types.h>
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * instrument_read - instrument regular read access
> > > > > > > > + *
> > > > > > > > + * Instrument a regular read access. The instrumentation should be inserted
> > > > > > > > + * before the actual read happens.
> > > > > > > > + *
> > > > > > > > + * @ptr address of access
> > > > > > > > + * @size size of access
> > > > > > > > + */
> > > > > > >
> > > > > > > Based on offline discussion, that's what we add for KMSAN:
> > > > > > >
> > > > > > > > +static __always_inline void instrument_read(const volatile void *v, size_t size)
> > > > > > > > +{
> > > > > > > > + kasan_check_read(v, size);
> > > > > > > > + kcsan_check_read(v, size);
> > > > > > >
> > > > > > > KMSAN: nothing
> > > > > >
> > > > > > KMSAN also has instrumentation in
> > > > > > copy_to_user_page/copy_from_user_page. Do we need to do anything for
> > > > > > KASAN/KCSAN for these functions?
> > > >
> > > > copy_to_user_page/copy_from_user_page can be instrumented with
> > > > instrument_copy_{to,from}_user_. I prefer keeping this series with no
> > > > functional change intended for KASAN at least.
> > > >
> > > > > There is also copy_user_highpage.
> > > > >
> > > > > And ioread/write8/16/32_rep: do we need any instrumentation there. It
> > > > > seems we want both KSAN and KCSAN too. One may argue that KCSAN
> > > > > instrumentation there is to super critical at this point, but KASAN
> > > > > instrumentation is important, if anything to prevent silent memory
> > > > > corruptions. How do we instrument there? I don't see how it maps to
> > > > > any of the existing instrumentation functions.
> > > >
> > > > These should be able to use the regular instrument_{read,write}. I
> > > > prefer keeping this series with no functional change intended for
> > > > KASAN at least.
> > >
> > > instrument_{read,write} will not contain any KMSAN instrumentation,
> > > which means we will effectively remove KMSAN instrumentation, which is
> > > weird because we instrumented these functions because of KMSAN in the
> > > first place...

I missed this. Yes, you're right.

> > > > > There is also kmsan_check_skb/kmsan_handle_dma/kmsan_handle_urb that
> > > > > does not seem to map to any of the instrumentation functions.
> > > >
> > > > For now, I would rather that there are some one-off special
> > > > instrumentation, like for KMSAN. Coming up with a unified interface
> > > > here that, without the use-cases even settled, seems hard to justify.
> > > > Once instrumentation for these have settled, unifying the interface
> > > > would have better justification.
> > >
> > > I would assume they may also require an annotation that checks the
> > > memory region under all 3 tools and we don't have such annotation
> > > (same as the previous case and effectively copy_to_user). I would
> > > expect such annotation will be used in more places once we start
> > > looking for more opportunities.
> >
> > Agreed, I'm certainly not against adding these. We may need to
> > introduce 'instrument_dma_' etc. However, would it be reasonable to do
> > this in a separate follow-up patch-series, to avoid stalling bitops
> > instrumentation? Assuming that the 8 hooks in instrumented.h right
> > now are reasonable, and such future changes add new hooks, I think
> > that would be the more pragmatic approach.
>
> I think it would be a wrong direction. Just like this change does not
> introduce all of instrument_test_and_set_bit,
> instrument___clear_bit_unlock, instrument_copyin,
> instrument_copyout_mcsafe, instrument_atomic_andnot, .... All of these
> can be grouped into a very small set of cases with respect to what
> type of memory access they do from the point of view of sanitizers.
> And we introduce instrumentation for these _types_ of accesses, rather
> than application functions (we don't care much if the access is for
> atomic operations, copy to/from user, usb, dma, skb or something
> else). It seems that our set of instrumentation annotations can't
> handle some very basic cases...

With the ioread/write, dma, skb, urb, user-copy cases in mind, it just
appears to me that attempting to find a minimal unifying set of
instrumentation hooks might lead us in circles, given we only have the
following options:

1. Do not introduce 'instrumented.h', and drop this series. With KMSAN
in mind, this is what I mentioned I preferred in the first place, and
just add a few dozen lines in each place we need to instrument. Yes,
yes, it's not as convenient, but at least we'll know it'll be correct
without having to reason about all cases and all sanitizers all at
once (with KMSAN not being in any kernel tree even).

2. This patch series, keeping 'instrumented.h', but only keep what we
use right now. This is knowing we'll likely have to add a number of
special cases (user-copy, ioread/write, etc) for now. Again,
KASAN/KCSAN probably want the same thing, but I don't know how much
conflict there will be with KMSAN after all is said and done. We will
incrementally add what is required, with unifying things later. This
will also satisfy Arnd's constraint of no empty functions:
http://lkml.kernel.org/r/CAK8P3a32sVU4umk2FLnWnMGMQxThvMHAKxVM+G4X-hMgpBsXMA@xxxxxxxxxxxxxx

3. Try to find a minimal set of instrumentation hooks that cater to
all tools (KASAN, KCSAN, KMSAN). Without even having all
instrumentation (without the 'instrumented.h' infrastructure) in
place, I feel this will not be too successful. I think we can do this
once we have instrumentation for all tools, in all places. Then
unifying all of them should be a non-functional-change refactor.
Essentially, this option depends on (1).

However, now we have some constraints which are difficult to satisfy
all at once:
1. Essentially we were told to avoid (1), based on Arnd's suggestion
to simplify the instrumentation. Therefore we thought (2) would be a
good idea.
2. Now that we know what (2) looks like, it seems you prefer (3),
because we should also cater to KMSAN with this patch series.
3. No unused hooks.

Given we have a KMSAN<-(1)<-(3) dependency, but we were told to avoid
(1), empty functions, and KMSAN hasn't yet landed, we can't reasonably
do (3). Since you dislike (2), we're stuck.

Any options I missed?

Thanks,
-- Marco