Re: [PATCH] cpufreq: Stop BUGing the system

From: Nishanth Menon
Date: Thu Dec 18 2014 - 09:49:26 EST


On 07:38-20141218, Viresh Kumar wrote:
> On 17 December 2014 at 21:21, Nishanth Menon <nm@xxxxxx> wrote:
> > CPUFRreq subsystem is not a system catastrophic failure point.
> > Failures in these cases DONOT need complete system shutdown with BUG.
> > just refuse to let cpufreq function should be good enough.
> >
> > Signed-off-by: Nishanth Menon <nm@xxxxxx>
> > ---
> > drivers/cpufreq/cpufreq.c | 17 +++++++++++++----
> > 1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index a09a29c..a5aa2fa 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -281,7 +281,10 @@ static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
> > static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
> > struct cpufreq_freqs *freqs, unsigned int state)
> > {
> > - BUG_ON(irqs_disabled());
> > + if (irqs_disabled()) {
> > + WARN(1, "IRQs disabled!\n");
> > + return;
> > + }
>
> What about:
>
> > + if (WARN(irqs_disabled(), "IRQs disabled!\n")
> > + return;
>
> Same for the last change as well..

k.
>
> >
> > if (cpufreq_disabled())
> > return;
> > @@ -1253,9 +1256,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> > /*
> > * Reaching here after boot in a few seconds may not
> > * mean that system will remain stable at "unknown"
> > - * frequency for longer duration. Hence, a BUG_ON().
> > + * frequency for longer duration. Hence, a WARN().
> > */
> > - BUG_ON(ret);
> > + if (ret) {
> > + WARN(1, "SYSTEM operating at invalid freq %u", policy->cur);
> > + goto err_out_unregister;
> > + }
>
> And I still don't agree for this one. We shouldn't keep on working on a
> potentially unstable frequency.

I can add "could be unstable" -> the point being there can be psuedo
errors reported in the system - example - clock framework bugs. Dont
just stop the boot. example: what if cpufreq was a driver module - it
would not have rescued the system because cpufreq had'nt detected the
logic - if we are going to force this on the system, we should probably
not do this in cpufreq code, instead should be somewhere generic.

While I do empathise (and had infact advocated in the past) of not
favouring system attempting to continue at an invalid configuration and
our attempt to rescue has failed - given that we cannot provide a
consistent behavior (it is not a core system behavior) and potential of
a false-postive (example clk framework or underlying bug), it should be
good enough to "enhance" WARN to be "severe sounding enough" to
flag it for developer and continue while keeping the system alive as
much as possible.


--
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/