Re: 2.6.29-rc1 does not boot

From: Ingo Molnar
Date: Mon Jan 12 2009 - 05:01:23 EST



* Mike Travis <travis@xxxxxxx> wrote:

> Dieter Ries wrote:
> > Hi,
> >
> > Ingo Molnar schrieb:
> >>>> * Dieter Ries <clip2@xxxxxx> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I just pulled 2.6.29-rc1, ran oldconfig with defaults and built it.
> >>>>> When I try to boot it, that kind of works until init should start. Then
> >>>>> nothing happens. I tried with init=/bin/bash, which sometimes works, and
> >>>>> sometimes gets me a bash without the prompt flashing.
> >>>>>
> >>>>> I captured the output with netconsole, but I cannot see a problem there.
> >>>>> It is attached.
> >>>>>
> >>>>> My config is also attached.
> >>>>>
> >>>>> The machine:
> >>>>>
> >>>>> Lenovo Thinkpad T60
> >>>>> Core2Duo 2GHz
> >>>>>
> >>>>> Gentoo 64bit
> >>>>>
> >>>>>
> >>>>> What else should I provide for debugging that?
> >> Unless you can see some particular badness in the kernel messages
> >> (something that changed to the last working version) that narrows it down
> >> to some subsystem, i suspect this would have to be bisected ...
> >
> > Bisected it:
> >
> > ####################################################################
> > 7503bfbae89eba07b46441a5d1594647f6b8ab7d is first bad commit
> > commit 7503bfbae89eba07b46441a5d1594647f6b8ab7d
> > Author: Mike Travis <travis@xxxxxxx>
> > Date: Sun Jan 4 05:18:09 2009 -0800
> >
> > cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> >
> > Impact: use new cpumask API to reduce stack usage
> >
> > Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr()
> > with a work_on_cpu function for drv_read() and drv_write().
> >
> > Basically converts do_drv_{read,write} into "work_on_cpu" functions that
> > are now called by drv_read and drv_write.
> >
> > Signed-off-by: Mike Travis <travis@xxxxxxx>
> > Acked-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
> > Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> > ####################################################################
> >
> > I reverted that patch, which makes my machine boot again. So I guess
> > theres something wrong here. Please tell me which information you need
> > to fix the problem, I will help as I can.
> >
> >
> >> Ingo
> >>
> >
> > cu
> > Dieter
> >
>
> Thanks for catching this.
>
> The work_on_cpu approach seems to create more problems than it solves.
> And testing it has proven difficult without the right combination of
> hardware.
>
> Could you send me the console log and config file?
>
> Rusty - any ideas on how to avoid these clashes with the
> get_online_cpus() call in work_on_cpu()? Or something else to indicate
> to lockdep that the circular lock dependency is ok (as you mentioned
> before)?

I've queued up the revert below, please check the commit message whether
you agree with the analysis.

Mike, could you also check any other patches where you add work_on_cpu()
usage to make sure we dont have similar mishaps? work_on_cpu() seems
completely unsuited for any sort of set_cpus_allowed() replacement ...

Ingo

---------------->