Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage

From: Daniel J Blueman
Date: Fri Jun 04 2010 - 05:00:11 EST


On Fri, Jun 4, 2010 at 5:10 AM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Jun 04, 2010 at 10:44:48AM +0800, Li Zefan wrote:
>> > Seems worth reviewing the other uses of task_group():
>> >
>> > 1.  set_task_rq() -- only a runqueue and a sched_rt_entity leave
>> >     the RCU read-side critical section.  Runqueues do persist.
>> >     I don't claim to understand the sched_rt_entity life cycle.
>> >
>> > 2.  __sched_setscheduler() -- not clear to me that this one is
>> >     protected to begin with.  If it is somehow correctly protected,
>> >     it discards the RCU-protected pointer immediately, so is OK
>> >     otherwise.
>> >
>> > 3.  cpu_cgroup_destroy() -- ditto.
>> >
>> > 4.  cpu_shares_read_u64() -- ditto.
>> >
>> > 5.  print_task() -- protected by rcu_read_lock() and discards the
>> >     RCU-protected pointer immediately, so this one is OK.
>> >
>> > Any task_group() experts able to weigh in on #2, #3, and #4?
>> >
>>
>> #3 and #4 are safe, because it's not calling task_group(), but
>> cgroup_tg():
>>
>>       struct task_group *tg = cgroup_tg(cgrp);
>>
>> As long as it's safe to access cgrp, it's safe to access tg.
>
> Good point, thank you!
>
> Any takers on #2?

Indeed, __sched_setscheduler() is not protected. How does this look?

Since the struct task_group pointed to by the return value from task_group
isn't taken holding the RCU read lock, when it is soon after dereferenced,
it may have gone.

Signed-off-by: Daniel J Blueman <daniel.blueman@xxxxxxxxx>

diff --git a/kernel/sched.c b/kernel/sched.c
index d484081..b086a36 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4483,9 +4483,13 @@ recheck:
* Do not allow realtime tasks into groups that have no runtime
* assigned.
*/
+ rcu_read_lock();
if (rt_bandwidth_enabled() && rt_policy(policy) &&
- task_group(p)->rt_bandwidth.rt_runtime == 0)
+ task_group(p)->rt_bandwidth.rt_runtime == 0) {
+ rcu_read_unlock();
return -EPERM;
+ }
+ rcu_read_unlock();
#endif

retval = security_task_setscheduler(p, policy, param);

>
>                                                        Thanx, Paul
>
>> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>> >
>> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> > index 50ec9ea..224ef98 100644
>> > --- a/kernel/sched_fair.c
>> > +++ b/kernel/sched_fair.c
>> > @@ -1251,7 +1251,6 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>> >     }
>> >
>> >     tg = task_group(p);
>> > -   rcu_read_unlock();
>> >     weight = p->se.load.weight;
>> >
>> >     imbalance = 100 + (sd->imbalance_pct - 100) / 2;
>> > @@ -1268,6 +1267,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>> >     balanced = !this_load ||
>> >             100*(this_load + effective_load(tg, this_cpu, weight, weight)) <=
>> >             imbalance*(load + effective_load(tg, prev_cpu, 0, weight));
>> > +   rcu_read_unlock();
>> >
>>
>> This is fine.
>>
>> Another way is :
>>
>> rcu_read_lock();
>> tg = task_group(p);
>> css_get(&tg->css);
>> rcu_read_unlock();
>>
>> /* do something */
>> ...
>>
>> css_put(&tg->css);
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/