Re: [PATCH] rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC

From: Paul E. McKenney
Date: Tue May 26 2015 - 11:18:19 EST


On Thu, May 21, 2015 at 11:53:21PM +0200, Denys Vlasenko wrote:
> On 05/21/2015 05:26 PM, Paul E. McKenney wrote:
> > On Thu, May 21, 2015 at 05:09:27PM +0200, Denys Vlasenko wrote:
> >> On 05/21/2015 03:45 PM, Steven Rostedt wrote:
> >>> On Thu, 21 May 2015 12:04:07 +0200
> >>> Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
> >>>
> >>>> DEBUG_LOCK_ALLOC=y is not a production setting, but it is
> >>>> not very unusual either. Many developers routinely
> >>>> use kernels built with it enabled.
> >>>>
> >>>> Apart from being selected by hand, it is also auto-selected by
> >>>> PROVE_LOCKING "Lock debugging: prove locking correctness" and
> >>>> LOCK_STAT "Lock usage statistics" config options.
> >>>> LOCK STAT is necessary for "perf lock" to work.
> >>>>
> >>>> I wouldn't spend too much time optimizing it, but this particular
> >>>> function has a very large cost in code size: when it is deinlined,
> >>>> code size decreases by 830,000 bytes:
> >>>>
> >>>> text data bss dec hex filename
> >>>> 85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
> >>>> 84837612 22294424 20627456 127759492 79d7484 vmlinux
> >>>>
> >>>> (with this config: http://busybox.net/~vda/kernel_config)
> >>>>
> >>>> Signed-off-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
> >>>> CC: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> >>>> CC: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> >>>> CC: Steven Rostedt <rostedt@xxxxxxxxxxx>
> >>>> CC: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> >>>> CC: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
> >>>> CC: linux-kernel@xxxxxxxxxxxxxxx
> >>>> CC: Tejun Heo <tj@xxxxxxxxxx>
> >>>> CC: Oleg Nesterov <oleg@xxxxxxxxxx>
> >>>> ---
> >>>> include/linux/rcupdate.h | 40 ++-----------------------------------
> >>>> kernel/rcu/update.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> 2 files changed, 54 insertions(+), 38 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >>>> index 7809749..6024a65 100644
> >>>> --- a/include/linux/rcupdate.h
> >>>> +++ b/include/linux/rcupdate.h
> >>>> @@ -439,46 +439,10 @@ int rcu_read_lock_bh_held(void);
> >>>> * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
> >>>> * RCU-sched read-side critical section. In absence of
> >>>> * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
> >>>> - * critical section unless it can prove otherwise. Note that disabling
> >>>> - * of preemption (including disabling irqs) counts as an RCU-sched
> >>>> - * read-side critical section. This is useful for debug checks in functions
> >>>> - * that required that they be called within an RCU-sched read-side
> >>>> - * critical section.
> >>>> - *
> >>>> - * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
> >>>> - * and while lockdep is disabled.
> >>>> - *
> >>>> - * Note that if the CPU is in the idle loop from an RCU point of
> >>>> - * view (ie: that we are in the section between rcu_idle_enter() and
> >>>> - * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
> >>>> - * did an rcu_read_lock(). The reason for this is that RCU ignores CPUs
> >>>> - * that are in such a section, considering these as in extended quiescent
> >>>> - * state, so such a CPU is effectively never in an RCU read-side critical
> >>>> - * section regardless of what RCU primitives it invokes. This state of
> >>>> - * affairs is required --- we need to keep an RCU-free window in idle
> >>>> - * where the CPU may possibly enter into low power mode. This way we can
> >>>> - * notice an extended quiescent state to other CPUs that started a grace
> >>>> - * period. Otherwise we would delay any grace period as long as we run in
> >>>> - * the idle task.
> >>>> - *
> >>>> - * Similarly, we avoid claiming an SRCU read lock held if the current
> >>>> - * CPU is offline.
> >>>> + * critical section unless it can prove otherwise.
> >>>> */
> >>>> #ifdef CONFIG_PREEMPT_COUNT
> >>>> -static inline int rcu_read_lock_sched_held(void)
> >>>> -{
> >>>> - int lockdep_opinion = 0;
> >>>> -
> >>>> - if (!debug_lockdep_rcu_enabled())
> >>>> - return 1;
> >>>> - if (!rcu_is_watching())
> >>>> - return 0;
> >>>> - if (!rcu_lockdep_current_cpu_online())
> >>>> - return 0;
> >>>> - if (debug_locks)
> >>>> - lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> >>>> - return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
> >>>> -}
> >>>> +int rcu_read_lock_sched_held(void);
> >>>> #else /* #ifdef CONFIG_PREEMPT_COUNT */
> >>>> static inline int rcu_read_lock_sched_held(void)
> >>>> {
> >>>> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> >>>> index e0d31a3..e02218f 100644
> >>>> --- a/kernel/rcu/update.c
> >>>> +++ b/kernel/rcu/update.c
> >>>> @@ -62,6 +62,58 @@ MODULE_ALIAS("rcupdate");
> >>>>
> >>>> module_param(rcu_expedited, int, 0);
> >>>>
> >>>> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> >>>> +/**
> >>>> + * rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
> >>>> + *
> >>>> + * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
> >>>> + * RCU-sched read-side critical section. In absence of
> >>>> + * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
> >>>> + * critical section unless it can prove otherwise. Note that disabling
> >>>> + * of preemption (including disabling irqs) counts as an RCU-sched
> >>>> + * read-side critical section. This is useful for debug checks in functions
> >>>> + * that required that they be called within an RCU-sched read-side
> >>>> + * critical section.
> >>>> + *
> >>>> + * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
> >>>> + * and while lockdep is disabled.
> >>>> + *
> >>>> + * Note that if the CPU is in the idle loop from an RCU point of
> >>>> + * view (ie: that we are in the section between rcu_idle_enter() and
> >>>> + * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
> >>>> + * did an rcu_read_lock(). The reason for this is that RCU ignores CPUs
> >>>> + * that are in such a section, considering these as in extended quiescent
> >>>> + * state, so such a CPU is effectively never in an RCU read-side critical
> >>>> + * section regardless of what RCU primitives it invokes. This state of
> >>>> + * affairs is required --- we need to keep an RCU-free window in idle
> >>>> + * where the CPU may possibly enter into low power mode. This way we can
> >>>> + * notice an extended quiescent state to other CPUs that started a grace
> >>>> + * period. Otherwise we would delay any grace period as long as we run in
> >>>> + * the idle task.
> >>>> + *
> >>>> + * Similarly, we avoid claiming an SRCU read lock held if the current
> >>>> + * CPU is offline.
> >>>> + */
> >>>> +#ifdef CONFIG_PREEMPT_COUNT
> >>>> +int rcu_read_lock_sched_held(void)
> >>>> +{
> >>>> + int lockdep_opinion = 0;
> >>>> +
> >>>> + if (!debug_lockdep_rcu_enabled())
> >>>> + return 1;
> >>>> + if (!rcu_is_watching())
> >>>> + return 0;
> >>>> + if (!rcu_lockdep_current_cpu_online())
> >>>> + return 0;
> >>>> + if (debug_locks)
> >>>> + lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
> >>>> + return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
> >>>> +}
> >>>> +EXPORT_SYMBOL(rcu_read_lock_sched_held);
> >>>> +#else
> >>>> +/* !CONFIG_PREEMPT_COUNT - the function is inlined to always return 1 */
> >>>
> >>> Nuke the #else. It's not needed and this is a common enough practice to
> >>> have the static inline foo() { } when disabled that we do not need to
> >>> comment about it here.
> >>
> >> Sending patch v2 in a few minutes.
> >>
> >>>> +#endif
> >>>
> >>> Hmm, you added two #ifdef, and one #endif. How does this even compile??
> >>
> >> Er... it... doesn't.
> >>
> >> There was "#if defined CONFIG_DEBUG_LOCK_ALLOC && defined CONFIG_PREEMPT_COUNT"
> >> but then I split it into two #ifs to have a nice explanatory empty
> >> #else clause. And I did not test it after that edit
> >> ("what could possibly go wrong?"). Sorry.
> >>
> >> Patch v2 I'm sending _is_ tested.
> >
> > Boot/run as well as build?
>
> I did not even dare to try booting 128 megabyte allyesconfig monstrosity,
> before you asked.
>
> It took me 4 hours of rebuilds to figure out that CONFIG_CMDLINE_BOOL needs to be
> disabled for my qemu boot to succeed :) :)

Just make sure you retain the .config file to allow you to more easily
test future changes! ;-)

> So, yes, patched allyesconfig kernel seems to boot... it takes 250 seconds in qemu.

Steve, are you OK with Denys's most recent patch?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/