Re: [PATCH 0/4] arm64: kgdb/kdb: Fix single-step debugging issues

From: Doug Anderson
Date: Mon Jul 06 2020 - 17:37:24 EST


Hi,

On Tue, Jun 30, 2020 at 12:22 AM Will Deacon <will@xxxxxxxxxx> wrote:
>
> On Mon, Jun 29, 2020 at 02:20:11PM -0700, Doug Anderson wrote:
> > On Sat, May 16, 2020 at 1:20 AM liwei (GF) <liwei391@xxxxxxxxxx> wrote:
> > > On 2020/5/14 8:34, Doug Anderson wrote:
> > > > On Sat, May 9, 2020 at 6:49 AM Wei Li <liwei391@xxxxxxxxxx> wrote:
> > > >>
> > > >> This patch set is to fix several issues of single-step debugging
> > > >> in kgdb/kdb on arm64.
> > > >>
> > > >> It seems that these issues have been shelved a very long time,
> > > >> but i still hope to solve them, as the single-step debugging
> > > >> is an useful feature.
> > > >>
> > > >> Note:
> > > >> Based on patch "arm64: cacheflush: Fix KGDB trap detection",
> > > >> https://www.spinics.net/lists/arm-kernel/msg803741.html
> > > >>
> > > >> Wei Li (4):
> > > >> arm64: kgdb: Fix single-step exception handling oops
> > > >> arm64: Extract kprobes_save_local_irqflag() and
> > > >> kprobes_restore_local_irqflag()
> > > >> arm64: kgdb: Fix single-stepping into the irq handler wrongly
> > > >> arm64: kgdb: Set PSTATE.SS to 1 to reenable single-step
> > > >>
> > > >> arch/arm64/include/asm/debug-monitors.h | 6 ++++++
> > > >> arch/arm64/kernel/debug-monitors.c | 28 ++++++++++++++++++++++++-
> > > >> arch/arm64/kernel/kgdb.c | 16 +++++++++++---
> > > >> arch/arm64/kernel/probes/kprobes.c | 28 ++-----------------------
> > > >> 4 files changed, 48 insertions(+), 30 deletions(-)
> > > >
> > > > Just an overall note that I'm very happy that you posted this patch
> > > > series! It's always been a thorn in my side that stepping and
> > > > breakpoints were so broken on arm64 and I'm really excited that you're
> > > > fixing them. Now I'll have to get in the habit of using kgdb for more
> > > > than just debugging crashes and maybe I can use it more for exploring
> > > > how functions work more. :-)
> > > > > I'll also note that with your patch series I'm even seeing the "call"
> > > > feature of gdb working now. That has always been terribly broken for
> > > > me.
> > > >
> > > Thanks for reviewing and testing this series.
> > > Enjoy exploring the kernel with kgdb/kdb! :-)
> >
> > I'm curious to know if you plan another spin. The feedback you
> > received was fairly minor so it hopefully shouldn't be too hard. I'd
> > very much like to get your patches landed and I'd be happy to try to
> > address the feedback and spin them myself if you're no longer
> > available for it.
>
> I'd welcome some feedback on the proposal I sent out at the end of last
> week:
>
> https://lore.kernel.org/r/20200626095551.GA9312@willie-the-truck
>
> which looks to solve some (all?) of these issues

Actually, I'm pretty sure that patch #1 of Wei Li's patch series would
still be needed. Would you object to landing it now just to get one
patch outta the way?

https://lkml.kernel.org/r/20200509214159.19680-2-liwei391@xxxxxxxxxx


> slightly differently,
> because it turns out we need to perform some low-level surgery for
> preempt-rt *anyway*...
>
> I'm particularly keen to hear any thoughts concerning the reschedule on
> return to EL1 after handling an interrupt that hit during a step.

Thanks for the pointer! Unfortunately this is yet another area that
I'm keenly interested in it working but pretty much clueless about how
this whole area of the system works. :-|

My first question, though, is why would we want interrupts enabled
while we're single stepping? If you're trying to get the processor to
just step one instruction forward you don't really want a bunch of
IRQs going off in the middle of it, do you? While in kgdb and
debugging the kernel I would assume that a single step would run the
very least amount of code that it could (other than debugger code).
In general every time I exit kgdb (and re-start running the kernel
normally) I sorta assume that there's a chance that something in the
system will hit a timeout (maybe it's talking to external hardware
that will give up or something). If I'm stepping through code I just
want that little bit of code I'm running to move forward and the rest
of the kernel to stand still. If I want the rest of the kernel to
proceed I guess I'd set a breakpoint and then say "continue".

It seemed like with Wei Li's patch that we were closing holes and
being more consistent about keeping interrupts disabled when stepping
and I liked that. Maybe if we made it so that taking interrupts
didn't break everything then it would be _OK_ to let them fire, but if
I had a choice I'd rather they didn't.

...of course, I'm looking at all this from the point of view of kgdb.
I don't know nearly enough about how other parts of the kernel use
single step to comment much on what would be best for them.


Did what I said make sense at all, or was it gibberish? ...or not
gibberish but not what you were looking for?


-Doug