Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC

From: Mark Rutland
Date: Tue Feb 02 2016 - 07:24:00 EST


On Mon, Feb 01, 2016 at 01:24:25PM -0800, Laura Abbott wrote:
> On 02/01/2016 04:29 AM, Mark Rutland wrote:
> >Hi,
> >
> >On Fri, Jan 29, 2016 at 03:46:57PM -0800, Laura Abbott wrote:
> >>
> >>ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
> >>pages for debugging purposes. This requires memory be mapped
> >>with PAGE_SIZE mappings since breaking down larger mappings
> >>at runtime will lead to TLB conflicts. Check if debug_pagealloc
> >>is enabled at runtime and if so, map everyting with PAGE_SIZE
> >>pages. Implement the functions to actually map/unmap the
> >>pages at runtime.
> >>
> >>
> >>Signed-off-by: Laura Abbott <labbott@xxxxxxxxxxxxxxxxx>
> >
> >I tried to apply atop of the arm64 for-next/pgtable branch, but git
> >wasn't very happy about that -- which branch/patches is this based on?
> >
> >I'm not sure if I'm missing something, have something I shouldn't, or if
> >my MTA is corrupting patches again...
> >
>
> Hmmm, I based it off of your arm64-pagetable-rework-20160125 tag and
> Ard's patch for vmalloc and set_memory_* . The patches seem to apply
> on the for-next/pgtable branch as well so I'm guessing you are missing
> Ard's patch.

Yup, that was it. I evidently was paying far too little attention as I'd
also missed the mm/ patch for the !CONFIG_DEBUG_PAGEALLOC case.

Is there anything else in mm/ that I've potentially missed? I'm seeing a
hang on Juno just after reaching userspace (splat below) with
debug_pagealloc=on.

It looks like something's gone wrong around find_vmap_area -- at least
one CPU is forever awaiting vmap_area_lock, and presumably some other
CPU has held it and gone into the weeds, leading to the RCU stalls and
NMI lockup warnings.

[ 31.037054] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 31.042684] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418
[ 31.050795] (detected by 1, t=5255 jiffies, g=340, c=339, q=50)
[ 31.056935] rcu_preempt kthread starved for 4838 jiffies! g340 c339 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
[ 36.509055] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/2:2H:995]
[ 36.521059] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [systemd-udevd:1048]
[ 36.533056] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [systemd-udevd:1037]
[ 36.545055] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [systemd-udevd:1036]
[ 56.497055] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [upstart-file-br:1012]
[ 94.057052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 94.062671] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418
[ 94.070780] (detected by 1, t=21010 jiffies, g=340, c=339, q=50)
[ 94.076981] rcu_preempt kthread starved for 20593 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
[ 157.077052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 157.082673] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418
[ 157.090782] (detected by 2, t=36765 jiffies, g=340, c=339, q=50)
[ 157.096986] rcu_preempt kthread starved for 36348 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
[ 220.097052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 220.102670] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418
[ 220.110779] (detected by 2, t=52520 jiffies, g=340, c=339, q=50)
[ 220.116971] rcu_preempt kthread starved for 52103 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
[ 283.117052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 283.122670] 0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418
[ 283.130779] (detected by 1, t=68275 jiffies, g=340, c=339, q=50)
[ 283.136973] rcu_preempt kthread starved for 67858 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0

Typically show-backtrace-all-active-cpus(l) gives me something like:

[ 183.282835] CPU: 0 PID: 998 Comm: systemd-udevd Tainted: G L 4.5.0-rc1+ #7
[ 183.290783] Hardware name: ARM Juno development board (r0) (DT)
[ 183.296659] task: ffffffc97437a400 ti: ffffffc973ec8000 task.ti: ffffffc973ec8000
[ 183.304095] PC is at _raw_spin_lock+0x34/0x48
[ 183.308421] LR is at find_vmap_area+0x24/0xa0
[ 183.312746] pc : [<ffffffc00065faf4>] lr : [<ffffffc000185bc4>] pstate: 60000145
[ 183.320092] sp : ffffffc973ecb6c0
[ 183.323382] x29: ffffffc973ecb6c0 x28: ffffffbde7d50300
[ 183.328662] x27: ffffffffffffffff x26: ffffffbde7d50300
[ 183.333941] x25: 000000097e513000 x24: 0000000000000001
[ 183.339219] x23: 0000000000000000 x22: 0000000000000001
[ 183.344498] x21: ffffffc000a6dd90 x20: ffffffc000a6d000
[ 183.349778] x19: ffffffc97540c000 x18: 0000007fc4e8b960
[ 183.355057] x17: 0000007fac3088d4 x16: ffffffc0001be448
[ 183.360336] x15: 003b9aca00000000 x14: 0032aa26d4000000
[ 183.365614] x13: ffffffffa94f64df x12: 0000000000000018
[ 183.370894] x11: ffffffc97eecd730 x10: 0000000000000030
[ 183.376173] x9 : ffffffbde7d50340 x8 : ffffffc0008556a0
[ 183.381451] x7 : ffffffc0008556b8 x6 : ffffffc0008556d0
[ 183.386729] x5 : ffffffc0009d2000 x4 : 0000000000000001
[ 183.392008] x3 : 000000000000d033 x2 : 000000000000000b
[ 183.397286] x1 : 00000000d038d033 x0 : ffffffc000a6dd90
[ 183.402563]

I'll have a go with lock debugging. Otherwise do you have any ideas?

Thanks,
Mark.