Re: [PATCH v2] Add kernel config option for fuzz testing.

From: Dmitry Vyukov
Date: Sun Mar 15 2020 - 02:00:33 EST


On Thu, Mar 12, 2020 at 11:29 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Fri, 13 Mar 2020 06:59:22 +0900
> Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > On 2020/03/13 4:23, Dmitry Vyukov wrote:
> > >> Or teach the fuzz tool not to do specific bad things.
> > >
> > > We do some of this.
> > > But generally it's impossible for anything that involves memory
> > > indirections, or depends on the exact type of fd (e.g. all ioctl's),
> > > etc. Boils down to halting problem and ability to predict exact
> > > behavior of arbitrary programs.
> >
> > I would like to enable changes like below only if CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING=y .
> >
> > Since TASK_RUNNING threads are not always running on CPUs (in syzbot, the kernel is
> > tested on a VM with only 2 CPUs, which means that many threads are simply waiting for
> > CPU time to be assigned), dumping locks held by all threads gives us more clue when
> > e.g. khungtask fired. But since lockdep_print_held_locks() is racy, I assume that
> > this change won't be accepted unless CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING=y .
> >
> > Also, for another example, limit number of memory pages /dev/ion driver can consume only if
> > CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING=y ( https://github.com/google/syzkaller/issues/1267 ),
> > for limiting number of memory pages is a user-visible change while we need to avoid false
> > alarms caused by consuming all memory pages.
> >
> > In other words, while majority of things CONFIG_KERNEL_BUILT_FOR_FUZZ_TESTING=y would
> > do "disable this", there would be a few "enable this" and "change this".
>
> I still fear that people will just disable large sections. I've seen this
> before. Developers take the easy way out, and when someone adds a new
> feature that may be dangerous, they will just say "oh turn off fuzzing" and
> be done with it.
>
> As Linus likes to say, when you need to make changes to the kernel to test
> it, you are no longer testing production kernels.


Well, it's tradeoffs all the way down.
For example, KASAN also radically affects the kernel (changes just
every line of code). There still should be final testing of the actual
production build of course. But I would say generally one wants to
test with KASAN enabled all the time.
Or, it also depends on the baseline. If our baseline is "everybody is
crazy about testing, ready to prioritize it on top of everything else
and spend infinite amounts of time on it", then there may be better
solutions. But if our baseline is "no testing at all", or "we have to
disable whole subsystems b/c we don't have better realistic options
and only few people willing to spend at least some time on it", then
it may be a good practical solution ;)


> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 32406ef0d6a2..1bc7878768fc 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -695,6 +695,7 @@ static void print_lock(struct held_lock *hlock)
> > static void lockdep_print_held_locks(struct task_struct *p)
> > {
> > int i, depth = READ_ONCE(p->lockdep_depth);
> > + bool unreliable;
> >
> > if (!depth)
> > printk("no locks held by %s/%d.\n", p->comm, task_pid_nr(p));
> > @@ -705,10 +706,12 @@ static void lockdep_print_held_locks(struct task_struct *p)
> > * It's not reliable to print a task's held locks if it's not sleeping
> > * and it's not the current task.
> > */
> > - if (p->state == TASK_RUNNING && p != current)
> > - return;
> > + unreliable = p->state == TASK_RUNNING && p != current;
> > for (i = 0; i < depth; i++) {
> > - printk(" #%d: ", i);
> > + if (unreliable)
> > + printk(" #%d?: ", i);
> > + else
> > + printk(" #%d: ", i);
>
> Have you tried submitting this? Has Peter nacked it?
>
> -- Steve
>
> > print_lock(p->held_locks + i);
> > }
> > }
>