Re: [PATCH] cpusets+hotplug+preepmt broken

From: Nathan Lynch
Date: Thu May 12 2005 - 19:36:54 EST


Paul Jackson wrote:
>
> I share you preference for not nesting these semaphores.
>
> The other choice I am aware of would be for the hotplug code to be less
> cpuset-friendly. In the move_task_off_dead_cpu() code, at the point it
> says "No more Mr. Nice Guy", instead of looking for the nearest
> enclosing cpuset that has something online, which is what the
> cpuset_cpus_allowed() does, instead we could just take any damn cpu that
> was online.
>
> Something along the lines of the following fix:
>
> --- pj/kernel.old/sched.c 2005-05-11 13:00:17.000000000 -0700
> +++ pj/kernel.new/sched.c 2005-05-11 13:02:24.000000000 -0700
> @@ -4229,7 +4229,7 @@ static void move_task_off_dead_cpu(int d
>
> /* No more Mr. Nice Guy. */
> if (dest_cpu == NR_CPUS) {
> - tsk->cpus_allowed = cpuset_cpus_allowed(tsk);
> + tsk->cpus_allowed = cpu_online_map;
> dest_cpu = any_online_cpu(tsk->cpus_allowed);

Well, CPU_MASK_ALL rather than cpu_online_map, I would think. That is
what the behavior was before the cpuset merge, anyway. It might be
the best short term solution, more below...

> So what we'd really like to do would be to first fallback to all the
> cpus allowed in the specified tasks cpuset (no walking the cpuset
> hierarchy), and see if any of those cpus are still online to receive
> this orphan task. Unless someone has botched the system configuration,
> and taken offline all the cpus in a cpuset, this should yield up a cpu
> that is still both allowed and online. If that fails, then to heck with
> honoring cpuset placement - just take the first online cpu we can find.
>
> This is doable without holding cpuset_sem. We can look at a current
> tasks cpuset without cpuset_sem, just with the task lock.

Yes, but your patch doesn't lock the task itself (unless I'm
misreading patches again). However, have a look at the comments above
task_lock in sched.h:

/*
* Protects ->fs, ->files, ->mm, ->ptrace, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
* neither inside nor outside.
*/
static inline void task_lock(struct task_struct *p)
{
spin_lock(&p->alloc_lock);
}


Unfortunately, move_task_off_dead_cpu is called from
migrate_live_tasks while the latter has a write_lock_irq on
tasklist_lock. So we can't use task_lock in this context, assuming
the comments are valid. Right?


Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/