Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections

From: Petr Mladek
Date: Thu Nov 22 2018 - 11:02:55 EST


On Thu 2018-11-22 11:04:22, Sergey Senozhatsky wrote:
> On (11/21/18 11:49), Waiman Long wrote:
> [..]
> > > case ODEBUG_STATE_ACTIVE:
> > > - debug_print_object(obj, "init");
> > > state = obj->state;
> > > raw_spin_unlock_irqrestore(&db->lock, flags);
> > > + debug_print_object(obj, "init");
> > > debug_object_fixup(descr->fixup_init, addr, state);
> > > return;
> > >
> > > case ODEBUG_STATE_DESTROYED:
> > > - debug_print_object(obj, "init");
> > > + debug_printobj = true;
> > > break;
> > > default:
> > > break;
> > > }
> > >
> > > raw_spin_unlock_irqrestore(&db->lock, flags);
> > > + if (debug_chkstack)
> > > + debug_object_is_on_stack(addr, onstack);
> > > + if (debug_printobj)
> > > + debug_print_object(obj, "init");
> > >
> [..]
> >
> > As a side note, one of the test systems that I used generated a
> > debugobjects splat in the bootup process and the system hanged
> > afterward. Applying this patch alone fix the hanging problem and the
> > system booted up successfully. So it is not really a good idea to call
> > printk() while holding a raw spinlock.

Please, was the system hang reproducible? I wonder if it was a
deadlock described by Sergey below.

The commit message is right. printk() might take too long and
cause softlockup or livelock. But it does not explain why
the system could competely hang.

Also note that prinkt() should not longer block a single process
indefinitely thanks to the commit dbdda842fe96f8932 ("printk:
Add console owner and waiter logic to load balance console writes").

> Some serial consoles call mod_timer(). So what we could have with the
> debug objects enabled was
>
> mod_timer()
> lock_timer_base()
> debug_activate()
> printk()
> call_console_drivers()
> foo_console()
> mod_timer()
> lock_timer_base() << deadlock

Anyway, I wonder what was the primary motivation for this patch.
Was it the system hang? Or was it lockdep report about nesting
two terminal locks: db->lock, pool_lock with logbuf_lock?

Printk is problematic. It might delay any atomic context
where it is used. I would like to better understand the
problem here before we start spread printk-related hacks.

Best Regards,
Petr