Re: [PATCH v4 3/3] media: atomisp: Only use trace_printk if allowed

From: Nicolas Boichat
Date: Thu Aug 20 2020 - 21:39:41 EST


On Fri, Aug 21, 2020 at 8:36 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Fri, 21 Aug 2020 08:13:00 +0800
> Nicolas Boichat <drinkcat@xxxxxxxxxxxx> wrote:
>
> > On Thu, Aug 20, 2020 at 10:23 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > >
> > > On Thu, 20 Aug 2020 17:14:12 +0800
> > > Nicolas Boichat <drinkcat@xxxxxxxxxxxx> wrote:
> > >
> > > > Technically, we could only initialize the trace_printk buffers
> > > > when the print env is switched, to avoid the build error and
> > > > unconditional boot-time warning, but I assume this printing
> > > > framework will eventually get removed when the driver moves out
> > > > of staging?
> > >
> > > Perhaps this should be converting into a trace event. Look at what bpf
> > > did for their bpf_trace_printk().
> > >
> > > The more I think about it, the less I like this series.
> >
> > To make it clear, the primary goal of this series is to get rid of
> > trace_printk sprinkled in the kernel by making sure some randconfig
> > builds fail. Since my v2, there already has been one more added (the
> > one that this patch removes), so I'd like to land 2/3 ASAP to prevent
> > even more from being added.
> >
> > Looking at your reply on 1/3, I think we are aligned on that goal? Is
> > there some other approach you'd recommend?
> >
> > Now, I'm not pretending my fixes are the best possible ones, but I
> > would much rather have the burden of converting to trace events on the
> > respective driver maintainers. (btw is there a short
> > documentation/tutorial that I could link to in these patches, to help
> > developers understand what is the recommended way now?)
> >
>
> I like the goal, but I guess I never articulated the problem I have
> with the methodology.
>
> trace_printk() is meant to be a debugging tool. Something that people
> can and do sprinkle all over the kernel to help them find a bug in
> areas that are called quite often (where printk() is way too slow).
>
> The last thing I want them to deal with is adding a trace_printk() with
> their distro's config (or a config from someone that triggered the bug)
> only to have the build to fail, because they also need to add a config
> value.
>
> I add to the Cc a few developers I know that use trace_printk() in this
> fashion. I'd like to hear their view on having to add a config option
> to make trace_printk work before they test a config that is sent to
> them.

Gotcha, thanks. I have also used trace_printk in the past, as
uncommitted changes (and understand the usefulness ,-)). And in Chrome
OS team here, developers have also raised this concern: how do we make
the developer flow convenient so that we can add trace_printk to our
code for debugging, without having to flip back that config option,
and _yet_ make sure that no trace_printk ever makes it into our
production kernels. We have creative ways of making that work (portage
USE flags and stuff). But I'm not sure about other flows, and your
concern is totally valid...

Some other approaches/ideas:
1. Filter all lkml messages that contain trace_printk. Already found
1 instance, and I can easily reply to those with a semi-canned answer,
if I remember to check that filter regularly (not sustainable in the
long run...).
2. Integration into some kernel test robot? (I will not roll my own
for this ,-)) It may be a bit difficult as some debug config options
do enable trace_printk, and that's ok.
3. In Chromium OS, I can add a unit test (i.e. something outside of
the normal kernel build system), but that'll only catch regressions
downstream (or when we happen to backport patches).

Down the line, #3 catches what I care about the most (Chromium OS
issues: we had production kernels for a few days/weeks showing that
splat on boot), but it'd be nice to have something upstream that
benefits everyone.

Thanks,

>
> -- Steve