Oops on 2.4.x invalid procfs i_ino value

From: Brent Casavant
Date: Fri Dec 17 2004 - 17:57:18 EST


I've run into a number of crashes while closing procfs stat files
when a system is under load. I think I've found the problem, but
am a little unsure how to proceed. This all happens to be on a
2.4.21 based kernel, but by my brief code inspection I think the
problem still exists on more recent 2.4.x kernels.

In procfs the fake_ino() macro is used to construct the inode number
for each entry.

#define fake_ino(pid,ino) (((pid)<<16)|(ino))

In particular this is used in proc_pid_make_inode:

inode->i_ino = fake_ino(task->pid, ino);

Note that a pid may be more than 16 bits in width (e.g. in IA64), and
we're trying to stuff it into the upper 16 bits of the inode number.
This isn't usually a problem, except when the lower 16 bits of the
inode happen to be 0 (i.e. pids that are a multiple of 65536).

Why does zero matter? Glad you asked.

In proc_delete_inode there is a check to see if the inode is is
a "proper" (whatever that means) procfs inode. The whole function
is:

static void proc_delete_inode(struct inode *inode)
{
struct proc_dir_entry *de = inode->u.generic_ip;

inode->i_state = I_CLEAR;

if (PROC_INODE_PROPER(inode)) {
proc_pid_delete_inode(inode);
return;
}
if (de) {
if (de->owner)
__MOD_DEC_USE_COUNT(de->owner);
de_put(de);
}
}

PROC_INODE_PROPER() is:

#define PROC_INODE_PROPER(inode) ((inode)->i_ino & ~0xffff)

In other words, it checks whether the top 16 bits of the inode number
(equivalent to the bottom 16 bits of the pid) are non-zero.

Thus closing a proc entry for any task with a pid that is a multiple of
65536 will fail this check, skip proc_pid_delete_inode, and call
__MOD_DEC_USE_COUNT, more than likely causing a panic on an invalid
memory access, and minimally corrupting something in memory otherwise.

I don't have a solution coded up (mostly because I'm a bit bleary
eyed after looking at crash dumps all day) -- but are there any
thoughts on how to go about addressing this one? An obvious workaround
is setting kernel.pid_max to 65535, but that's only a workaround, not
a solution.

On a related note, if it matters, on about half the crash dumps I've
looked at, I see a pid of 0 has been assigned to a user process,
tripping this same problem. I suspect there's another bug somewhere
that's allowing a pid of 0 to be chosen in the first place -- but I
don't totally discount that this problem may lay in SGI's patches to
this particular kernel -- I'll need to take a more thorough look.

Thanks,
Brent

--
Brent Casavant If you had nothing to fear,
bcasavan@xxxxxxx how then could you be brave?
Silicon Graphics, Inc. -- Queen Dama, Source Wars
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/