Re: [PATCH] arm64/trap: fix broken ct->nmi_nesting when die() is called in a kthread

From: Yeoreum Yun
Date: Tue Jun 03 2025 - 11:24:13 EST


Hi Mark,

> On Tue, Jun 03, 2025 at 12:14:18PM +0100, Yeoreum Yun wrote:
> > > On Mon, Jun 02, 2025 at 06:50:53PM +0100, Yeoreum Yun wrote:
> > > > So, what I think:
> > > > 1. arm64_enter_el1_dbg() should ct_nmi_enter() as it is.
> > > > 2. in bug_handler() while handling BUG_TYPE, add above ct_nmi_exit()
> > > > conditional call.
> > > > 3. DAIF.D and DAIF.A handling.
> > >
> > > No, that is not safe. In step 2, calling ct_nmi_exit() would undo *all*
> > > of the ct_nmi_enter() logic, and may stop RCU from watching if the
> > > exception was entered from some intermediate/inconsistent state.
> >
> > Yes if call ct_nmi_enter() without condition.
> > But I imply with the condition check what I posted.
> > if CT_NESTING_IRQ_NONIDLE,
> > it wouldn't need call and that cpu can be watched by RCU.
>
> I am not keen on conditionally calling ct_nmi_exit(), and would strongly
> prefer to avoid that, regardless of where that lives in the flow.
>
> I suspect that it would be bettter to triage the interrupted context
> earlier, and rethink the way entry/exit works, but that's a much larger
> bit of work and will take more thinking.

Thanks for sharing your thought.
I'll think about it and let me raise it again after ada's patchset is
merged.


--
Sincerely,
Yeoreum Yun