Re: 2.6.29-rc1 does not boot and fails to resume

From: Rafael J. Wysocki
Date: Mon Jan 12 2009 - 12:22:43 EST


On Monday 12 January 2009, Ingo Molnar wrote:
>
> * Maciej Rutecki <maciej.rutecki@xxxxxxxxx> wrote:
>
> > 2009/1/11 Dieter Ries <clip2@xxxxxx>:
> > > Hi,
> > >
> > > Ingo Molnar schrieb:
> > >>>> * Dieter Ries <clip2@xxxxxx> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I just pulled 2.6.29-rc1, ran oldconfig with defaults and built it.
> > >>>>> When I try to boot it, that kind of works until init should start. Then
> > >>>>> nothing happens. I tried with init=/bin/bash, which sometimes works, and
> > >>>>> sometimes gets me a bash without the prompt flashing.
> > >>>>>
> >
> > Simmilar issue, described:
> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.1/01701.html
> >
> > [...]
> > ####################################################################
> > > 7503bfbae89eba07b46441a5d1594647f6b8ab7d is first bad commit
> > > commit 7503bfbae89eba07b46441a5d1594647f6b8ab7d
> > > Author: Mike Travis <travis@xxxxxxx>
> > > Date: Sun Jan 4 05:18:09 2009 -0800
> > >
> > > cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> > >
> > > Impact: use new cpumask API to reduce stack usage
> > >
> > > Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr()
> > > with a work_on_cpu function for drv_read() and drv_write().
> > >
> > > Basically converts do_drv_{read,write} into "work_on_cpu" functions that
> > > are now called by drv_read and drv_write.
> > >
> > > Signed-off-by: Mike Travis <travis@xxxxxxx>
> > > Acked-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> >
> > Revert this commit also solves problem on my laptoop.
>
> yes - the revert patch can be found below.

Just for the record, apart from the boot problems the commit the patch below
reverts is also causing a suspend-resume regression for people.

Thanks,
Rafael


> ------------------->
> From e0b7a3bea054249b27ca3c843bf6eefcb509d1c2 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@xxxxxxx>
> Date: Mon, 12 Jan 2009 10:49:53 +0100
> Subject: [PATCH] Revert "cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write"
>
> This reverts commit 7503bfbae89eba07b46441a5d1594647f6b8ab7d.
>
> Dieter Ries reported bootup soft-hangs and bisected it back to
> this commit, and reverting this commit gave him a working system.
>
> The commit introduces work_on_cpu() use into the cpufreq code,
> but that is subtly problematic from a lock hierarchy POV: the
> hotplug-cpu lock is an highlevel lock that is taken before
> lowlevel locks, and in this codepath we are called with the
> policy lock taken.
>
> Dieter did not have lockdep enabled so we dont have a nice stack
> trace proof for this, but using work_on_cpu() in such a lowlevel
> place certainly looks wrong, so we revert the patch.
>
> work_on_cpu() needs to be reworked to be more generally usable.
>
> Reported-by: Dieter Ries <clip2@xxxxxx>
> Tested-by: Dieter Ries <clip2@xxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> ---
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 25 ++++++++++++-------------
> 1 files changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> index 06fcd8f..6f11e02 100644
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -150,9 +150,8 @@ struct drv_cmd {
> u32 val;
> };
>
> -static long do_drv_read(void *_cmd)
> +static void do_drv_read(struct drv_cmd *cmd)
> {
> - struct drv_cmd *cmd = _cmd;
> u32 h;
>
> switch (cmd->type) {
> @@ -167,12 +166,10 @@ static long do_drv_read(void *_cmd)
> default:
> break;
> }
> - return 0;
> }
>
> -static long do_drv_write(void *_cmd)
> +static void do_drv_write(struct drv_cmd *cmd)
> {
> - struct drv_cmd *cmd = _cmd;
> u32 lo, hi;
>
> switch (cmd->type) {
> @@ -189,23 +186,30 @@ static long do_drv_write(void *_cmd)
> default:
> break;
> }
> - return 0;
> }
>
> static void drv_read(struct drv_cmd *cmd)
> {
> + cpumask_t saved_mask = current->cpus_allowed;
> cmd->val = 0;
>
> - work_on_cpu(cpumask_any(cmd->mask), do_drv_read, cmd);
> + set_cpus_allowed_ptr(current, cmd->mask);
> + do_drv_read(cmd);
> + set_cpus_allowed_ptr(current, &saved_mask);
> }
>
> static void drv_write(struct drv_cmd *cmd)
> {
> + cpumask_t saved_mask = current->cpus_allowed;
> unsigned int i;
>
> for_each_cpu(i, cmd->mask) {
> - work_on_cpu(i, do_drv_write, cmd);
> + set_cpus_allowed_ptr(current, cpumask_of(i));
> + do_drv_write(cmd);
> }
> +
> + set_cpus_allowed_ptr(current, &saved_mask);
> + return;
> }
>
> static u32 get_cur_val(const struct cpumask *mask)
> @@ -231,15 +235,10 @@ static u32 get_cur_val(const struct cpumask *mask)
> return 0;
> }
>
> - if (unlikely(!alloc_cpumask_var(&cmd.mask, GFP_KERNEL)))
> - return 0;
> -
> cpumask_copy(cmd.mask, mask);
>
> drv_read(&cmd);
>
> - free_cpumask_var(cmd.mask);
> -
> dprintk("get_cur_val = %u\n", cmd.val);
>
> return cmd.val;
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/