Re: WARN_ON_ONCE

From: Dmitry Vyukov
Date: Sun Dec 06 2020 - 07:15:25 EST


On Sat, Dec 5, 2020 at 1:05 PM Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote:
>
> Alexey Kardashevskiy <aik@xxxxxxxxx> writes:
> > On 04/12/2020 12:25, Michael Ellerman wrote:
> >> Dmitry Vyukov <dvyukov@xxxxxxxxxx> writes:
> >>> On Thu, Dec 3, 2020 at 10:19 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >>>> On Thu, Dec 3, 2020 at 10:10 AM Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
> >>>>>
> >>>>> Hi!
> >>>>>
> >>>>> Syzkaller triggered WARN_ON_ONCE at
> >>>>>
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/tracepoint.c?h=v5.10-rc6#n266
> >>>>>
> >>>>>
> >>>>> ===
> >>>>> static int tracepoint_add_func(struct tracepoint *tp,
> >>>>> struct tracepoint_func *func, int prio)
> >>>>> {
> >>>>> struct tracepoint_func *old, *tp_funcs;
> >>>>> int ret;
> >>>>>
> >>>>> if (tp->regfunc && !static_key_enabled(&tp->key)) {
> >>>>> ret = tp->regfunc();
> >>>>> if (ret < 0)
> >>>>> return ret;
> >>>>> }
> >>>>>
> >>>>> tp_funcs = rcu_dereference_protected(tp->funcs,
> >>>>> lockdep_is_held(&tracepoints_mutex));
> >>>>> old = func_add(&tp_funcs, func, prio);
> >>>>> if (IS_ERR(old)) {
> >>>>> WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
> >>>>> return PTR_ERR(old);
> >>>>> }
> >>>>>
> >>>>> ===
> >>>>>
> >>>>> What is the common approach here? Syzkaller reacts on this as if it was
> >>>>> a bug but WARN_ON_ONCE here seems intentional. Do we still push for
> >>>>> removing such warnings?
> >>
> >> AFAICS it is a bug if that fires.
> >>
> >> See the commit that added it:
> >> d66a270be331 ("tracepoint: Do not warn on ENOMEM")
> >>
> >> Which says:
> >> Tracepoint should only warn when a kernel API user does not respect the
> >> required preconditions (e.g. same tracepoint enabled twice,
> >
> > This says that the userspace can trigger the warning if it does not use
> > the API right.
>
> No I don't think it says that.
>
> It's saying that it should be a WARN if a *kernel* user of the
> tracepoint API violates the API. The implication is that this condition
> should never happen if the kernel is using the tracepoint API correctly,
> and so if we hit this condition it indicates a bug in the kernel that
> should be fixed.
>
> >> or called
> >> to remove a tracepoint that does not exist).
> >>
> >> Silence warning in out-of-memory conditions, given that the error is
> >> returned to the caller.
> >>
> >>
> >> So if you're seeing it then you've someone caused it to return something
> >> other than ENOMEM, and that is a bug.
> >
> > This is an userspace bug which registers the same thing twice, the
> > kernel returns a correct error. The question is should it warn by
> > WARN_ON or pr_err(). The comment in bug.h suggests pr_err() is the right
> > way, is not it?
>
> Userspace must not be able to trigger a WARN.
>
> What is the path into that code from userspace?

There are lots of info on this WARNING in the syzbot report:
https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153
https://lore.kernel.org/lkml/000000000000a6348d05a9234041@xxxxxxxxxx/

There are lots of sample stacks and reproducers, also happens on 4.14 and 4.19.

> Either something on that path should be checking that it's not violating
> the API and triggering the WARN, or if that's not possible/easy then the
> WARN should be removed.
>
> cheers