Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

From: Andy Lutomirski
Date: Fri Oct 19 2018 - 10:29:51 EST




> On Oct 19, 2018, at 1:33 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>> On Fri, Oct 19, 2018 at 01:08:23AM +0000, Nadav Amit wrote:
>> Consider for example do_int3(), and see my inlined comments:
>>
>> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
>> {
>> ...
>> ist_enter(regs); // => preempt_disable()
>> cond_local_irq_enable(regs); // => assume it enables IRQs
>>
>> ...
>> // resched irq can be delivered here. It will not caused rescheduling
>> // since preemption is disabled
>>
>> cond_local_irq_disable(regs); // => assume it disables IRQs
>> ist_exit(regs); // => preempt_enable_no_resched()
>> }
>>
>> At this point resched will not happen for unbounded length of time (unless
>> there is another point when exiting the trap handler that checks if
>> preemption should take place).
>>
>> Another example is __BPF_PROG_RUN_ARRAY(), which also uses
>> preempt_enable_no_resched().
>>
>> Am I missing something?
>
> Would not the interrupt return then check for TIF_NEED_RESCHED and call
> schedule() ?

The paranoid exit path doesnât check TIF_NEED_RESCHED because itâs fundamentally atomic â itâs running on a percpu stack and it canât schedule. In theory we could do some evil stack switching, but we donât.

How does NMI handle this? If an NMI that hit interruptible kernel code overflows a perf counter, how does the wake up work?

(do_int3() is special because itâs not actually IST. But it can hit in odd places due to kprobes, and Iâm nervous about recursing incorrectly into RCU and context tracking code if we were to use exception_enter().)

>
> I think (and this certainly wants a comment) is that the ist_exit()
> thing hard relies on the interrupt-return path doing the reschedule.