Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

From: Alexander Potapenko
Date: Thu May 05 2022 - 14:05:34 EST


On Tue, May 3, 2022 at 12:00 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Alexander,

First of all, thanks a lot for the comments, those are greatly appreciated!
I tried to revert this patch and the previous one ("kmsan:
instrumentation.h: add instrumentation_begin_with_regs()") and
reimplement unpoisoning pt_regs without breaking into
instrumentation_begin(), see below.

> >
> > Regarding the regs, you are right. It should be enough to unpoison the
> > regs at idtentry prologue instead.
> > I tried that initially, but IIRC it required patching each of the
> > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin().
>
> Exactly 4 instances :)
>

Not really, I had to add a call to `kmsan_unpoison_memory(regs,
sizeof(*regs));` to the following places in
arch/x86/include/asm/idtentry.h:
- DEFINE_IDTENTRY()
- DEFINE_IDTENTRY_ERRORCODE()
- DEFINE_IDTENTRY_RAW()
- DEFINE_IDTENTRY_RAW_ERRORCODE()
- DEFINE_IDTENTRY_IRQ()
- DEFINE_IDTENTRY_SYSVEC()
- DEFINE_IDTENTRY_SYSVEC_SIMPLE()
- DEFINE_IDTENTRY_DF()

, but even that wasn't enough. For some reason I also had to unpoison
pt_regs directly in
DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) and
DEFINE_IDTENTRY_IRQ(common_interrupt).
In the latter case, this could have been caused by
asm_common_interrupt being entered from irq_entries_start(), but I am
not sure what is so special about sysvec_apic_timer_interrupt().

Ideally, it would be great to find that single point where pt_regs are
set up before being passed to all IDT entries.
I used to do that by inserting calls to kmsan_unpoison_memory right
into arch/x86/entry/entry_64.S
(https://github.com/google/kmsan/commit/3b0583f45f74f3a09f4c7e0e0588169cef918026),
but that required storing/restoring all GP registers. Maybe there's a
better way?


>
> then
>
> instrumentation_begin();
> foo(fargs...);
> bar(bargs...);
> instrumentation_end();
>
> is a source of potential false positives because the state is not
> guaranteed to be correct, neither for foo() nor for bar(), even if you
> wipe the state in instrumentation_begin(), right?

Yes, this is right.

> This approximation approach smells fishy and it's inevitably going to be
> a constant source of 'add yet another kmsan annotation/fixup' patches,
> which I'm not interested in at all.
>
> As this needs compiler support anyway, then why not doing the obvious:
>
> #define noinstr \
> .... __kmsan_conditional
>
> #define instrumentation_begin() \
> ..... __kmsan_cond_begin
>
> #define instrumentation_end() \
> __kmsan_cond_end .......
>
> and let the compiler stick whatever is required into that code section
> between instrumentation_begin() and instrumentation_end()?

We define noinstr as
__attribute__((disable_sanitizer_instrumentation))
(https://llvm.org/docs/LangRef.html#:~:text=disable_sanitizer_instrumentation),
which means no instrumentation will be applied to the annotated
function.
Changing that behavior by adding subregions that can be instrumented
sounds questionable.
C also doesn't have good syntactic means to define these subregions -
perhaps some __xxx_begin()/__xxx_end() intrinsics would work, but they
would require both compile-time and run-time validation.

Fortunately, I don't think we need to insert extra instrumentation
into instrumentation_begin()/instrumentation_end() regions.

What I have in mind is adding a bool flag to kmsan_context_state, that
the instrumentation sets to true before the function call.
When entering an instrumented function, KMSAN would check that flag
and set it to false, so that the context state can be only used once.
If a function is called from another instrumented function, the
context state is properly set up, and there is nothing to worry about.
If it is called from non-instrumented code (either noinstr or the
skipped files that have KMSAN_SANITIZE:=n), KMSAN would detect that
and wipe the context state before use.

By the way, I've noticed that at least for now (with pt_regs
unpoisoning performed in IDT entries) the problem with false positives
in noinstr code is entirely gone, so maybe we don't even have to
bother.

> Yes, it's more work on the tooling side, but the tooling side is mostly
> a one time effort while chasing the false positives is a long term
> nightmare.

Well said.

> Thanks,
>
> tglx



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.