Re: next-20250605: Test regression: qemu-x86_64-compat mode ltp tracing Oops int3 kernel panic
From: Google
Date: Tue Jun 17 2025 - 06:42:10 EST
On Mon, 16 Jun 2025 16:36:59 +0900
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
> > So the fundamental issue is that smp_text_poke_batch missed
> > handling INT3.
> >
> > I guess some text_poke user do not get text_mutex?
>
> Hmm, I've checked the smp_text_poke_* users, but it seems no problem.
> Basically, those smp_text_poke* user locks text_mutex, and another
> suspicious ftrace_start_up is also set under ftrace_lock.
> ftrace_arch_code_modify_post_process() is also paired with
> ftrace_arch_code_modify_prepare() and under ftrace_lock.
Eventually, I found a bug in text_poke, and jump_label
(tracepoint) hit the bug.
The jump_label uses 2 different APIs (single and batch)
which independently takes text_mutex lock.
smp_text_poke_single()
__jump_label_transform()
jump_label_transform() --> lock text_mutex
smp_text_poke_batch_add()
arch_jump_label_transform_queue() -> lock text_mutex
smp_text_poke_batch_finish()
arch_jump_label_transform_apply() -> lock text_mutex
This is allowed by commit 8a6a1b4e0ef1 ("x86/alternatives:
Remove the mixed-patching restriction on smp_text_poke_single()"),
but smp_text_poke_single() still expects that the batched
APIs are run in the same text_mutex lock region.
Thus if user calls those APIs in the below order;
arch_jump_label_transform_queue(addr1)
jump_label_transform(addr2)
arch_jump_label_transform_apply()
And if the addr1 > addr2, the bsearch on the array
does not work, and failed to handle int3!
This can explain the disappeared int3 case. If it happens
right before int3 is overwritten, that int3 will be
overwritten when the int3 handler dumps the code, but
text_poke_array_refs is still 1.
It seems that commit c8976ade0c1b ("x86/alternatives:
Simplify smp_text_poke_single() by using tp_vec and existing APIs")
introduced this problem, because it shares the global array in
the text_poke_batch and text_poke_single. Before that commit,
text_poke_single (text_poke_bp) uses its local variable.
To fix this issue, Use smp_text_poke_batch_add() in
smp_text_poke_single(), which checks whether the array
sorted and the array index does not overflow.
Please test below;