Re: [BUG] "sched: Remove rq->lock from the first half of ttwu()"locks up on ARM

From: Michal Simek
Date: Tue May 31 2011 - 07:08:45 EST


Peter Zijlstra wrote:
On Fri, 2011-05-27 at 21:52 +0100, Russell King - ARM Linux wrote:
On Fri, May 27, 2011 at 02:06:29PM +0200, Ingo Molnar wrote:
The expectations are to have irqs off (we are holding the runqueue lock if !__ARCH_WANT_INTERRUPTS_ON_CTXSW), so that's not workable i suspect.
Just a thought, but we _might_ be able to avoid a lot of this hastle if
we had a new arch hook in finish_task_switch(), after finish_lock_switch()
returns but before the old MM is dropped.

I'd be more than willing to provide this.

For the new ASID-based switch_mm(), we currently do this:

1. check ASID validity
2. flush branch predictor
3. set reserved ASID value
4. set new page tables
5. set new ASID value

This will be shortly changed to:

1. check ASID validity
2. flush branch predictor
3. set swapper_pg_dir tables
4. set new ASID value
5. set new page tables

We could change switch_mm() to only do:

1. flush branch predictor
2. set swapper_pg_dir tables
3. check ASID validity
4. set new ASID value

At this point, we have no user mappings, and so nothing will be using the
ASID at this point. Then in a new post-finish_lock_switch() arch hook:

5. check whether we need to do flushing as a result of ASID change
6. set new page tables

I think this may simplify the ASID code. It needs prototyping out,
reviewing and testing, but I think it may work.

And I think it may also be workable with the CPUs which need to flush
the caches on context switches - we can postpone their page table
switch to this new arch hook too, which will mean we wouldn't require
__ARCH_WANT_INTERRUPTS_ON_CTXSW on ARM at all.

Any thoughts (if you've followed what I'm going on about) ?

Yeah, definitely worth a try, you mentioned on IRC the problem of
detecting if switch_mm() happened in the new arch hook. Since
switch_mm() gets a @next pointer we can set a TIF flag there and have
the new arch hook test for that and conditionally perform the required
work.

Now, supposing we can get ARM to not rely on
__ARCH_WANT_INTERRUPTS_ON_CTXSW anymore, there's only microblaze left,
Michal, would a similar scheme work for you? If so we can fully
deprecate and remove this exception from the scheduler (yay!).

Hi,

please correct me if I am wrong but this is workaround just for ARM.
I am not aware that we need to do anything with caches. I enabled that options
after our discussion (http://lkml.org/lkml/2009/12/3/204) because of problems with lockdep. I will look if I can remove that option but it will be necessary to do some changes in code. switch_to should be called with irq OFF right?

Michal


Michal






--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/