Re: kernel BUG at kernel/exit.c:792!

From: Ingo Molnar
Date: Wed Dec 03 2003 - 15:28:44 EST



On Wed, 3 Dec 2003, Manfred Spraul wrote:

> It's wrong, because next_thread() relies on
>
> task->pids[PIDTYPE_TGID].pid_chain.next
>
> That pointer is not valid after detach_pid(task, PIDTYPE_TGID), and
> that's called within __unhash_process. Thus next_thread() fails if it's
> called on a dead task. Srivatsa's second patch is the right change: If
> pid_alive() is wrong, then break from the loop without calling
> next_thread().

yes. And for thread groups this can only happen for the thread group
leader if all 'child' threads have exited. So it can never happen that we
somehow get to a 'middle' thread, walk the chain and get to a task that is
invalid. The only possibility is that the starting point is stale itself -
and the pid_alive() test checks that.

the thread group leader being 'zombie' does not mean it's unhashed. Thread
group leaders are never detached threads, they have a parent that waits
for them. So these zombies just hang around until the last thread goes
away, and then the leader is released, unhashed from the PID space (and
thus next_thread() stops being valid) and the parent is notified.

Ingo

--- linux/fs/proc/base.c.orig
+++ linux/fs/proc/base.c
@@ -1666,7 +1666,12 @@ static int get_tid_list(int index, unsig

index -= 2;
read_lock(&tasklist_lock);
- do {
+ /*
+ * The starting point task (leader_task) might be an already
+ * unlinked task, which cannot be used to access the task-list
+ * via next_thread().
+ */
+ if (pid_alive(task)) do {
int tid = task->pid;
if (!pid_alive(task))
continue;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/