Re: [PATCH]: Re: kernel 2.6.9-rc1-mm4 oops

From: Kirill Korotaev
Date: Mon Sep 13 2004 - 09:38:05 EST


Ingo Molnar wrote:
* Kirill Korotaev <dev@xxxxx> wrote:
the BUG() is useful for all the code that uses next_thread() - you can
only do a safe next_thread() iteration if you've locked ->sighand.

1. I don't see spin_lock() on p->sighand->siglock in do_task_stat() before calling next_thread(). And the check inside next_thread() permits only one of the locks to be taken:

if (!spin_is_locked(&p->sighand->siglock) &&
!rwlock_is_locked(&tasklist_lock))

which is probably wrong, since tasklist_lock is always required!

It's not 'wrong' in terms of correctness it's simply too restrictive for
no reason. I agree that we should check for the tasklist lock only.
that is what I wanted to say :)
I removed check for siglock being locked and changed check for sighand != NULL to pid.nr check as we discussed below.

2. I think the idea of checking sighand is quite obscure. Probably it
would be better to call pid_alive() for check at such places in proc,
isn't it?
yeah, it's just as good of a check.
So I replaced the check in your patch with pid_alive() one, ok?

But I would propose to reorganize these checks in next_thread() to
something like this:

if (!rwlock_is_locked(&tasklist_lock) || p->pids[PIDTYPE_TGID].nr == 0)
BUG();

the last check ensures that we are still hashed and this check is more straithforward for understanding, agree?
yep - please send a new patch to Andrew.
here it is, please review it as well.

There are 2 patches here:

diff-next_thread (for both linus and 2.6.9-rc1-mm4 trees)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This patch changes obscure BUG() checks in next_thread() with pid checks meaning exactly the same (It checks for task being hashed).

Signed-Off-By: Kirill Korotaev <dev@xxxxx>

diff-task_stat (for 2.6.9-rc1-mm4 tree)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This patch fixes BUG() happening in do_task_stat()->next_thread(), since tsk->sighand can be NULL there. It adds check for pid_alive() in do_task_stat() to prevent thread loop for already unhashed task.

Signed-Off-By: Kirill Korotaev <dev@xxxxx>

Kirill --- ./include/linux/sched.h.nt 2004-09-13 18:00:12.000000000 +0400
+++ ./include/linux/sched.h 2004-09-13 18:06:03.680828072 +0400
@@ -699,6 +699,12 @@ extern struct task_struct *find_task_by_
extern void set_special_pids(pid_t session, pid_t pgrp);
extern void __set_special_pids(pid_t session, pid_t pgrp);

+/* checks whether task is still hashed and can be accessed safely */
+static inline int pid_alive(struct task_struct *p)
+{
+ return p->pids[PIDTYPE_PID].nr != 0;
+}
+
/* per-UID process charging. */
extern struct user_struct * alloc_uid(uid_t);
static inline struct user_struct *get_uid(struct user_struct *u)
--- ./fs/proc/array.c.nt 2004-09-13 18:00:09.178720584 +0400
+++ ./fs/proc/array.c 2004-09-13 18:00:51.861231856 +0400
@@ -356,7 +356,7 @@ static int do_task_stat(struct task_stru
stime = task->signal->stime;
}
}
- if (whole) {
+ if (whole && pid_alive(task)) {
t = task;
do {
min_flt += t->min_flt;
--- ./fs/proc/base.c.nt 2004-09-13 18:00:09.181720128 +0400
+++ ./fs/proc/base.c 2004-09-13 18:00:51.862231704 +0400
@@ -793,11 +793,6 @@ static struct inode_operations proc_pid_
.follow_link = proc_pid_follow_link
};

-static inline int pid_alive(struct task_struct *p)
-{
- return p->pids[PIDTYPE_PID].nr != 0;
-}
-
#define NUMBUF 10

static int proc_readfd(struct file * filp, void * dirent, filldir_t filldir)
--- ./kernel/exit.c.nt 2004-09-13 18:00:12.727181136 +0400
+++ ./kernel/exit.c 2004-09-13 18:00:51.864231400 +0400
@@ -848,10 +848,7 @@ asmlinkage long sys_exit(int error_code)
task_t fastcall *next_thread(const task_t *p)
{
#ifdef CONFIG_SMP
- if (!p->sighand)
- BUG();
- if (!spin_is_locked(&p->sighand->siglock) &&
- !rwlock_is_locked(&tasklist_lock))
+ if (!rwlock_is_locked(&tasklist_lock) || p->pids[PIDTYPE_TGID].nr == 0)
BUG();
#endif
return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);