Re: INFO: rcu detected stall in dummy_timer

From: Alan Stern
Date: Wed Sep 18 2019 - 10:16:40 EST


On Wed, 18 Sep 2019, Andrey Konovalov wrote:

> > > Why does dumy_hcd require CONFIG_HZ=1000? The comment doesn't really
> > > explain the reason.
> >
> > Oh, that's simple enough. USB events tend to happen at millisecond
> > intervals. The data on the USB bus is organized into frames (and
> > microframes for high speed and SuperSpeed); a frame lasts one
> > millisecond (and a microframe lasts 1/8 ms). Many host controllers
> > report important events when a frame boundary occurs (that's how
> > dummy-hcd works).
> >
> > So for proper timing of the emulation, dummy-hcd requires timer
> > interrupts with millisecond resolution. I suppose the driver could be
> > changed to use a high-res timer instead of a normal kernel timer, but
> > for now that doesn't seem particularly important.
>
> So what are the practical differences between using CONFIG_HZ=100 and
> 1000 for dummy-hcd? Is is going to be slower or faster?

The timing of the emulation will be more accurate with 1000. Of
course, for your purposes that doesn't matter. Also, the driver will
probably end up using a higher fraction of the total CPU time.

> Or can it get
> overloaded with data and cause stalls?

I really don't know the answer to that. It seems probable that 100 is
okay and is less likely to lead to overload and stalls than 1000.

> Or something else? We're somewhat hesitant to change CONFIG_HZ as
> we don't know how it will affect other parts of the kernel (at some
> point the USB fuzzer will become a part of the main syzbot instance
> that doesn't only fuzz USB).

Leaving it at 100 should be okay for now. Especially since we have
decided to fix this particular problem in an independent way.

In general, I don't know how dummy-hcd will behave when a driver gets
into a tight retry loop. In theory, it might end up using so much CPU
time that you get an rcu stall like the one we saw, but I don't
understand exactly what happened in this case. You'd think that with
no more than six (or however many threads syzbot used) callbacks per
jiffy, there would be plenty of time for normal threads to run.

Alan Stern