Re: Bisected regression in Linux 3.7.0-rc1, hang on login caused byreplace_fd()

From: Al Viro
Date: Mon Oct 15 2012 - 17:51:20 EST


On Mon, Oct 15, 2012 at 05:14:37PM -0400, Pavel Roskin wrote:
> Hello!
>
> I tried the current mainline Linux on Fedora 16 x64_64 and found that I
> cannot login in gdm. I'm using gdm with LXDE. After a few minutes,
> kernel messages appear indicating a hung process gnome-settings-daemon.

What is it hanging on?

> I'll try to figure out why replace_fd() is not good in
> umh_pipe_setup(), but I'd like to report the problem as soon as
> possible, as it's very visible to the users and I don't want others to
> spend hours bisecting it.
>
> --
> Regards,
> Pavel Roskin

[snip]

> [ 241.483499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 241.484888] gnome-settings- D 0000000000000000 6136 3155 1 0x00000000
> [ 241.486265] ffff8801298afcc8 0000000000000046 ffff88012993b420 0000000000000246
> [ 241.487652] ffff8801298affd8 ffff8801298affd8 ffff8801298affd8 ffff8801298affd8
> [ 241.489035] ffff88012993a2c0 ffff88012993b420 ffff880128cb0b80 ffff88012993b420
> [ 241.490421] Call Trace:
> [ 241.491765] [<ffffffff815d8064>] schedule+0x24/0x70
> [ 241.493116] [<ffffffff81043945>] exit_mm+0x85/0x120
> [ 241.494467] [<ffffffff8104628f>] do_exit+0x13f/0x580
> [ 241.495784] [<ffffffff81046957>] do_group_exit+0x47/0xb0
> [ 241.497090] [<ffffffff810597ef>] get_signal_to_deliver+0x21f/0x580
> [ 241.498384] [<ffffffff81002014>] do_signal+0x24/0x120
> [ 241.499673] [<ffffffff815da28c>] ? sysret_signal+0x5/0x47
> [ 241.500964] [<ffffffff81002195>] do_notify_resume+0x65/0xa0
> [ 241.502239] [<ffffffff8129342e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 241.503525] [<ffffffff815da51a>] int_signal+0x12/0x17
> [ 241.504813] no locks held by gnome-settings-/3155.
> [ 241.506095] INFO: task gnome-settings-:3156 blocked for more than 120 seconds.
> [ 241.507405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 241.508721] gnome-settings- D 00000000ffffffff 5104 3156 1 0x00000000
> [ 241.510056] ffff88011c80bcc8 0000000000000046 ffff88011c80bc78 ffffffff8109ad9d
> [ 241.511414] ffff88011c80bfd8 ffff88011c80bfd8 ffff88011c80bfd8 ffff88011c80bfd8
> [ 241.512780] ffff8801279fd6e0 ffff88012993a2c0 ffff880129809d28 ffff88012993a2c0
> [ 241.514134] Call Trace:
> [ 241.515480] [<ffffffff8109ad9d>] ? trace_hardirqs_on_caller+0x7d/0x120
> [ 241.516857] [<ffffffff815d8064>] schedule+0x24/0x70
> [ 241.518219] [<ffffffff81043945>] exit_mm+0x85/0x120
> [ 241.519592] [<ffffffff8104628f>] do_exit+0x13f/0x580
> [ 241.520952] [<ffffffff81046957>] do_group_exit+0x47/0xb0
> [ 241.522315] [<ffffffff810597ef>] get_signal_to_deliver+0x21f/0x580
> [ 241.523687] [<ffffffff81002014>] do_signal+0x24/0x120
> [ 241.525066] [<ffffffff8109b8a4>] ? __lock_is_held+0x54/0x70
> [ 241.526436] [<ffffffff815da28c>] ? sysret_signal+0x5/0x47
> [ 241.527841] [<ffffffff81002195>] do_notify_resume+0x65/0xa0
> [ 241.529237] [<ffffffff8129342e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 241.530625] [<ffffffff815da51a>] int_signal+0x12/0x17
> [ 241.532009] no locks held by gnome-settings-/3156.
> [ 241.533386] INFO: task gnome-settings-:3157 blocked for more than 120 seconds.
> [ 241.534778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 241.536196] gnome-settings- D 00000000ffffffff 5432 3157 1 0x00000000
> [ 241.537632] ffff8801279efcc8 0000000000000046 ffff88012993c580 0000000000000246
> [ 241.539069] ffff8801279effd8 ffff8801279effd8 ffff8801279effd8 ffff8801279effd8
> [ 241.540515] ffff88012661e840 ffff88012993c580 ffff880128cb0b80 ffff88012993c580
> [ 241.541961] Call Trace:
> [ 241.543377] [<ffffffff815d8064>] schedule+0x24/0x70
> [ 241.544802] [<ffffffff81043945>] exit_mm+0x85/0x120
> [ 241.546226] [<ffffffff8104628f>] do_exit+0x13f/0x580
> [ 241.547642] [<ffffffff81046957>] do_group_exit+0x47/0xb0
> [ 241.549053] [<ffffffff810597ef>] get_signal_to_deliver+0x21f/0x580
> [ 241.550468] [<ffffffff815d940b>] ? _raw_spin_unlock_irq+0x2b/0x50
> [ 241.551878] [<ffffffff81002014>] do_signal+0x24/0x120
> [ 241.553297] [<ffffffff81122e77>] ? might_fault+0x57/0xb0
> [ 241.554726] [<ffffffff815da28c>] ? sysret_signal+0x5/0x47
> [ 241.556148] [<ffffffff81002195>] do_notify_resume+0x65/0xa0
> [ 241.557588] [<ffffffff8129342e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 241.557590] [<ffffffff815da51a>] int_signal+0x12/0x17
> [ 241.557595] no locks held by gnome-settings-/3157.

> @@ -443,14 +453,23 @@ static void wait_for_dump_helpers(struct file *file)
> static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> {
> struct file *files[2];
> + struct fdtable *fdt;
> struct coredump_params *cp = (struct coredump_params *)info->data;
> + struct files_struct *cf = current->files;
> int err = create_pipe_files(files, 0);
> if (err)
> return err;
>
> cp->file = files[1];
>
> - replace_fd(0, files[0], 0);
> + sys_close(0);
> + fd_install(0, files[0]);
> + spin_lock(&cf->file_lock);
> + fdt = files_fdtable(cf);
> + __set_open_fd(0, fdt);
> + __clear_close_on_exec(0, fdt);
> + spin_unlock(&cf->file_lock);

Very interesting. Let's see what could be going on there. First of
all, create_pipe_files() has return 0. There's only one exit where
it can do so and it dereferences both files[0] and files[1] on the
way there. Since that hadn't oopsed, they are both non-NULL.

IOW, replace_fd(0, files[0], 0) is equivalent to
{
int err;
struct file *f = files[0];
struct files_struct *files = current->files;
if (0 >= rlimit(RLIMIT_NOFILE))
return -EMFILE;

spin_lock(&files->file_lock);
err = expand_files(files, 0);
if (unlikely(err < 0))
goto out_unlock;
return do_dup2(files, f, 0, 0);

out_unlock:
spin_unlock(&files->file_lock);
return err;
}

Aha. We have the first thing to test - is RLIMIT_NOFILE set to 0 for that
process, by any chance? We definitedly need to check replace_fd() return
value, BTW. OK, suppose it wasn't 0. So we'd done expand_files(files, 0).
expand_files() begins with
/* Do we need to expand? */
if (nr < fdt->max_fds)
return 0;
and ->max_fds is never going to be less than NR_OPEN_DEFAULT, which is
definitely positive. So expand_files(files, 0) is going to return 0
without doing anything. And we get to do_dup2():
fdt = files_fdtable(files);
tofree = fdt->fd[0];
if (!tofree && fd_is_open(0, fdt))
goto Ebusy;
that really shouldn't be possible to hit in this case - it can happen
only for shared descriptor table. Another thread would need to have
already reserved fd 0, but hadn't got around to installing a reference
to struct file into it. But we are called from the freshly created
kernel thread, about to exec some userland binary. Those are spawned
without CLONE_FILES in flags, so the table can't be shared. It's fresh
from dup_fd() and this bit
/*
* The fd may be claimed in the fd bitmap but not yet
* instantiated in the files array if a sibling thread
* is partway through open(). So make sure that this
* fd is available to the new process.
*/
__clear_open_fd(open_files - i, new_fdt);
in there would have prevented anything of that kind.

Now we have tofree equal to whatever we used to have in ->fd[0] (realistically
that's going to be NULL). Next we have
get_file(file);
rcu_assign_pointer(fdt->fd[0], file);
__set_open_fd(0, fdt);
if (0 & O_CLOEXEC)
__set_close_on_exec(fd, fdt);
else
__clear_close_on_exec(fd, fdt);
spin_unlock(&files->file_lock);

if (tofree)
filp_close(tofree, files);

Now, note that sys_close(0) would have done something very similar -
file = fdt->fd[0];
if (!file)
goto out_unlock;
rcu_assign_pointer(fdt->fd[0], NULL);
__clear_close_on_exec(0, fdt);
__put_unused_fd(files, 0);
spin_unlock(&files->file_lock);
filp_close(file, files);
'file' here is what we had in 'tofree'. We'd ripped it out of descriptor
table and done filp_close() on it. When we follow with
fd_install(0, files[0]);
...
__set_open_fd(0, fdt);
__clear_close_on_exec(0, fdt);
we have get_file() done on files[0], files[0] put into ->fd[0] (fd_install()),
result of __put_unused(files, 0); undone (by __set_open_fd()) and we come
to exactly the same situation as with replace_fd().

All right, so the next step in debugging is to print the damn
return value of replace_fd(). That should narrow the things down.
Another interesting question, of course, is what makes that gnome turd
coredump on each boot?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/