Re: [PATCH v3 3/4] kdb: Fix "btc <cpu>" crash if the CPU didn't round up

From: Daniel Thompson
Date: Thu Oct 10 2019 - 11:07:42 EST


On Mon, Oct 07, 2019 at 04:34:55PM -0700, Doug Anderson wrote:
> Hi,
>
> On Mon, Oct 7, 2019 at 6:55 AM Daniel Thompson
> <daniel.thompson@xxxxxxxxxx> wrote:
> >
> > On Wed, Sep 25, 2019 at 01:02:19PM -0700, Douglas Anderson wrote:
> > >
> > > I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't
> > > rounded up that I'd crash. I was going to copy the same fix from
> > > commit 162bc7f5afd7 ("kdb: Don't back trace on a cpu that didn't round
> > > up") into the "not all the CPUs" case, but decided it'd be better to
> > > clean things up a little bit.
> > >
> > > This consolidates the two code paths. It is _slightly_ wasteful in in
> > > that the checks for "cpu" being too small or being offline isn't
> > > really needed when we're iterating over all online CPUs, but that
> > > really shouldn't hurt. Better to have the same code path.
> > >
> > > While at it, eliminate at least one slightly ugly (and totally
> > > needless) recursive use of kdb_parse().
> > >
> > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > ---
> > >
> > > Changes in v3:
> > > - Patch ("kdb: Fix "btc <cpu>" crash if the CPU...") new for v3.
> > >
> > > Changes in v2: None
> > >
> > > kernel/debug/kdb/kdb_bt.c | 61 ++++++++++++++++++++++-----------------
> > > 1 file changed, 34 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
> > > index 120fc686c919..d9af139f9a31 100644
> > > --- a/kernel/debug/kdb/kdb_bt.c
> > > +++ b/kernel/debug/kdb/kdb_bt.c
> > > @@ -101,6 +101,27 @@ kdb_bt1(struct task_struct *p, unsigned long mask, bool btaprompt)
> > > return 0;
> > > }
> > >
> > > +static void
> > > +kdb_bt_cpu(unsigned long cpu)
> > > +{
> > > + struct task_struct *kdb_tsk;
> > > +
> > > + if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> > > + kdb_printf("WARNING: no process for cpu %ld\n", cpu);
> > > + return;
> > > + }
> > > +
> > > + /* If a CPU failed to round up we could be here */
> > > + kdb_tsk = KDB_TSK(cpu);
> > > + if (!kdb_tsk) {
> > > + kdb_printf("WARNING: no task for cpu %ld\n", cpu);
> > > + return;
> > > + }
> > > +
> > > + kdb_set_current_task(kdb_tsk);
> > > + kdb_bt1(kdb_tsk, ~0UL, false);
> > > +}
> > > +
> > > int
> > > kdb_bt(int argc, const char **argv)
> > > {
> > > @@ -161,7 +182,6 @@ kdb_bt(int argc, const char **argv)
> > > } else if (strcmp(argv[0], "btc") == 0) {
> > > unsigned long cpu = ~0;
> > > struct task_struct *save_current_task = kdb_current_task;
> > > - char buf[80];
> > > if (argc > 1)
> > > return KDB_ARGCOUNT;
> > > if (argc == 1) {
> > > @@ -169,35 +189,22 @@ kdb_bt(int argc, const char **argv)
> > > if (diag)
> > > return diag;
> > > }
> > > - /* Recursive use of kdb_parse, do not use argv after
> > > - * this point */
> > > - argv = NULL;
> > > if (cpu != ~0) {
> > > - if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> > > - kdb_printf("no process for cpu %ld\n", cpu);
> > > - return 0;
> > > - }
> > > - sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
> > > - kdb_parse(buf);
> > > - return 0;
> > > - }
> > > - kdb_printf("btc: cpu status: ");
> > > - kdb_parse("cpu\n");
> > > - for_each_online_cpu(cpu) {
> > > - void *kdb_tsk = KDB_TSK(cpu);
> > > -
> > > - /* If a CPU failed to round up we could be here */
> > > - if (!kdb_tsk) {
> > > - kdb_printf("WARNING: no task for cpu %ld\n",
> > > - cpu);
> > > - continue;
> > > + kdb_bt_cpu(cpu);
> > > + } else {
> > > + /*
> > > + * Recursive use of kdb_parse, do not use argv after
> > > + * this point.
> > > + */
> > > + argv = NULL;
> > > + kdb_printf("btc: cpu status: ");
> > > + kdb_parse("cpu\n");
> > > + for_each_online_cpu(cpu) {
> > > + kdb_bt_cpu(cpu);
> > > + touch_nmi_watchdog();
> > > }
> > > -
> > > - sprintf(buf, "btt 0x%px\n", kdb_tsk);
> > > - kdb_parse(buf);
> > > - touch_nmi_watchdog();
> > > + kdb_set_current_task(save_current_task);
> > > }
> > > - kdb_set_current_task(save_current_task);
> >
> > Why does this move out into only one of the conditional branches?
> > Don't both of the above paths modify the current task?
>
> The old code has a "return 0 in the case that "cpu != ~0", so this
> basically matches the prior behavior in restoring the current task for
> a "btc" but not leaving the current task changed in the case of "btc
> <cpu>". Thus my patch doesn't actually change the existing behavior,
> but I guess that it does make the control flow simpler so it's easier
> to understand what the behavior is. ;-)

Point taken. Horrific though it may be ;-) .


> Reading through other control flows of the various backtrace commands,
> it looks like it is intentional to leave the current task changed when
> you explicitly do an action on that task (or a CPU).
>
> Actually, though, it wasn't clear to me that it ever made sense for
> any of these commands to implicitly leave the current task changed.
> If you agree, I can send a follow-up patch to change this behavior.

Personally I don't like implicit changes of state but I might need a bit
more thinking to agree (or disagree ;-) ).


Daniel.