Re: [PATCH v2] kcov: update pos before writing pc in trace function

From: Dmitry Vyukov
Date: Tue May 24 2022 - 02:39:14 EST


On Tue, 24 May 2022 at 05:08, Liu, Congyu <liu3101@xxxxxxxxxx> wrote:
>
> It was actually first found in the kernel trace module I wrote for my research
> project. For each call instruction I instrumented one trace function before it
> and one trace function after it, then expected traces generated from
> them would match since I only instrumented calls that return. But it turns
> out that it didn't match from time to time in a non-deterministic manner.
> Eventually I figured out it was actually caused by the overwritten issue
> from interrupt. I then referred to kcov for a solution but it also suffered from
> the same issue...so here's this patch :).

Ah, interesting. Thanks for sharing.

> ________________________________________
> From: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> Sent: Monday, May 23, 2022 4:38
> To: Liu, Congyu
> Cc: andreyknvl@xxxxxxxxx; kasan-dev@xxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2] kcov: update pos before writing pc in trace function
>
> On Mon, 23 May 2022 at 07:35, Congyu Liu <liu3101@xxxxxxxxxx> wrote:
> >
> > In __sanitizer_cov_trace_pc(), previously we write pc before updating pos.
> > However, some early interrupt code could bypass check_kcov_mode()
> > check and invoke __sanitizer_cov_trace_pc(). If such interrupt is raised
> > between writing pc and updating pos, the pc could be overitten by the
> > recursive __sanitizer_cov_trace_pc().
> >
> > As suggested by Dmitry, we cold update pos before writing pc to avoid
> > such interleaving.
> >
> > Apply the same change to write_comp_data().
> >
> > Signed-off-by: Congyu Liu <liu3101@xxxxxxxxxx>
>
> This version looks good to me.
> I wonder how you encountered this? Do you mind sharing a bit about
> what you are doing with kcov?
>
> Reviewed-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
>
> Thanks
>
> > ---
> > PATCH v2:
> > * Update pos before writing pc as suggested by Dmitry.
> >
> > PATCH v1:
> > https://lore.kernel.org/lkml/20220517210532.1506591-1-liu3101@xxxxxxxxxx/
> > ---
> > kernel/kcov.c | 14 ++++++++++++--
> > 1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > index b3732b210593..e19c84b02452 100644
> > --- a/kernel/kcov.c
> > +++ b/kernel/kcov.c
> > @@ -204,8 +204,16 @@ void notrace __sanitizer_cov_trace_pc(void)
> > /* The first 64-bit word is the number of subsequent PCs. */
> > pos = READ_ONCE(area[0]) + 1;
> > if (likely(pos < t->kcov_size)) {
> > - area[pos] = ip;
> > + /* Previously we write pc before updating pos. However, some
> > + * early interrupt code could bypass check_kcov_mode() check
> > + * and invoke __sanitizer_cov_trace_pc(). If such interrupt is
> > + * raised between writing pc and updating pos, the pc could be
> > + * overitten by the recursive __sanitizer_cov_trace_pc().
> > + * Update pos before writing pc to avoid such interleaving.
> > + */
> > WRITE_ONCE(area[0], pos);
> > + barrier();
> > + area[pos] = ip;
> > }
> > }
> > EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
> > @@ -236,11 +244,13 @@ static void notrace write_comp_data(u64 type, u64 arg1, u64 arg2, u64 ip)
> > start_index = 1 + count * KCOV_WORDS_PER_CMP;
> > end_pos = (start_index + KCOV_WORDS_PER_CMP) * sizeof(u64);
> > if (likely(end_pos <= max_pos)) {
> > + /* See comment in __sanitizer_cov_trace_pc(). */
> > + WRITE_ONCE(area[0], count + 1);
> > + barrier();
> > area[start_index] = type;
> > area[start_index + 1] = arg1;
> > area[start_index + 2] = arg2;
> > area[start_index + 3] = ip;
> > - WRITE_ONCE(area[0], count + 1);
> > }
> > }
> >
> > --
> > 2.34.1
> >