Re: [PATCH v3 0/2] arm64: Fix pending single-step debugging issues

From: Sumit Garg
Date: Thu Aug 04 2022 - 05:18:59 EST


On Mon, 11 Jul 2022 at 19:21, Sumit Garg <sumit.garg@xxxxxxxxxx> wrote:
>
> On Mon, 11 Jul 2022 at 19:17, Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Mon, Jul 11, 2022 at 5:44 AM Sumit Garg <sumit.garg@xxxxxxxxxx> wrote:
> > >
> > > > I'll also note that I _think_ I remember that with Wei's series that
> > > > the gdb function "call" started working. I tried that here and it
> > > > didn't seem so happy. To keep things simple, I created a dummy
> > > > function in my kernel that looked like:
> > > >
> > > > void doug_test(void)
> > > > {
> > > > pr_info("testing, 1 2 3\n");
> > > > }
> > > >
> > > > I broke into the debugger by echoing "g" to /proc/sysrq-trigger and
> > > > then tried "call doug_test()". I guess my printout actually printed
> > > > but it wasn't so happy after that. Seems like it somehow ended up
> > > > returning to a bogus address after the call which then caused a crash.
> > > >
> > >
> > > I am able to reproduce this issue on my setup as well. But it doesn't
> > > seem to be a regression caused by this patch-set over Wei's series. As
> > > I could reproduce this issue with v1 [1] patch-set as well which was
> > > just a forward port of pending patches from Wei's series to the latest
> > > upstream.
> > >
> > > Maybe it's a different regression caused by other changes? BTW, do you
> > > remember the kernel version you tested with Wei's series applied?
> >
> > Sorry, I don't remember! :( I can't even be 100% sure that I'm
> > remembering correctly that I tested it back in the day, so it's
> > possible that it simply never worked...
>
> Okay, no worries. Let me see if I can come up with a separate fix for this.
>

After digging deep into GDB call function operations for aarch64, it
is certain that function calls simply never worked due to below
reasons:

1. On aarch64, GDB call function inserts a breakpoint at the
entrypoint of kernel (which is ffffffc008000000 from your dump) as
return address from function called. And since it refers to the
"_text" symbol which is marked non-executable as the actual text
section starts with the "_stext" symbol. I did a following hack that
makes it executable:

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 626ec32873c6..e39ad1a5f5d6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -700,7 +700,7 @@ static bool arm64_early_this_cpu_has_bti(void)
static void __init map_kernel(pgd_t *pgdp)
{
static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
- vmlinux_initdata, vmlinux_data;
+ vmlinux_initdata, vmlinux_data, vmlinux_htext;

/*
* External debuggers may need to write directly to the text
@@ -721,6 +721,8 @@ static void __init map_kernel(pgd_t *pgdp)
* Only rodata will be remapped with different permissions later on,
* all other segments are allowed to use contiguous mappings.
*/
+ map_kernel_segment(pgdp, _text, _stext, text_prot, &vmlinux_htext, 0,
+ VM_NO_GUARD);
map_kernel_segment(pgdp, _stext, _etext, text_prot, &vmlinux_text, 0,
VM_NO_GUARD);
map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL,

2. For the GDB function "call" to work, GDB inserts a dummy stack
frame. But in case of kernel on aarch64, the stack used is common
among the exception handler and the kernel threads. So it's not
trivial to insert a dummy stack frame and requires rework of exception
entry code as it pushes pt_regs onto the stack.

-Sumit

>
> >
> > -Doug