Re: [PATCH 2/2] thp: support split page table lock

From: Naoya Horiguchi
Date: Fri Sep 06 2013 - 14:01:40 EST


Hi Alex,

On Fri, Sep 06, 2013 at 11:04:23AM -0500, Alex Thorlton wrote:
> On Thu, Sep 05, 2013 at 05:27:46PM -0400, Naoya Horiguchi wrote:
> > Thp related code also uses per process mm->page_table_lock now.
> > So making it fine-grained can provide better performance.
> >
> > This patch makes thp support split page table lock by using page->ptl
> > of the pages storing "pmd_trans_huge" pmds.
> >
> > Some functions like pmd_trans_huge_lock() and page_check_address_pmd()
> > are expected by their caller to pass back the pointer of ptl, so this
> > patch adds to those functions new arguments for that. Rather than that,
> > this patch gives only straightforward replacement.
> >
> > ChangeLog v3:
> > - fixed argument of huge_pmd_lockptr() in copy_huge_pmd()
> > - added missing declaration of ptl in do_huge_pmd_anonymous_page()
>
> I've applied these and tested them using the same tests program that I
> used when I was working on the same issue, and I'm running into some
> bugs. Here's a stack trace:

Thank you for helping testing. This bug is new to me.

> general protection fault: 0000 [#1] SMP
> Modules linked in:
> CPU: 268 PID: 32381 Comm: memscale Not tainted
> 3.11.0-medusa-03121-g757f8ca #184
> Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS
> 01/15/2013
> task: ffff880fbdd82180 ti: ffff880fc0c5a000 task.ti: ffff880fc0c5a000
> RIP: 0010:[<ffffffff810e3eef>] [<ffffffff810e3eef>]
> pgtable_trans_huge_withdraw+0x38/0x60
> RSP: 0018:ffff880fc0c5bc88 EFLAGS: 00010297
> RAX: ffffea17cebe8838 RBX: 00000015309bd000 RCX: ffffea01f623b028
> RDX: dead000000100100 RSI: ffff8dcf77d84c30 RDI: ffff880fbda67580
> RBP: ffff880fc0c5bc88 R08: 0000000000000013 R09: 0000000000014da0
> R10: ffff880fc0c5bc88 R11: ffff888f7efda000 R12: ffff8dcf77d84c30
> R13: ffff880fc0c5bdf8 R14: 800005cf401ff067 R15: ffff8b4de5fabff8
> FS: 0000000000000000(0000) GS:ffff880fffd80000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffff768b0b8 CR3: 0000000001a0b000 CR4: 00000000000407e0
> Stack:
> ffff880fc0c5bcc8 ffffffff810f7643 ffff880fc0c5bcc8 ffffffff810d8297
> ffffea1456237510 00007fc7b0e00000 0000000000000000 00007fc7b0c00000
> ffff880fc0c5bda8 ffffffff810d85ba ffff880fc0c5bd48 ffff880fc0c5bd68
> Call Trace:
> [<ffffffff810f7643>] zap_huge_pmd+0x4c/0x101
> [<ffffffff810d8297>] ? tlb_flush_mmu+0x58/0x75
> [<ffffffff810d85ba>] unmap_single_vma+0x306/0x7d6
> [<ffffffff810d8ad9>] unmap_vmas+0x4f/0x82
> [<ffffffff810dab5e>] exit_mmap+0x8b/0x113
> [<ffffffff810a9743>] ? __delayacct_add_tsk+0x170/0x182
> [<ffffffff8103c609>] mmput+0x3e/0xc4
> [<ffffffff8104088c>] do_exit+0x380/0x907
> [<ffffffff810fb89c>] ? vfs_write+0x149/0x1a3
> [<ffffffff81040e85>] do_group_exit+0x72/0x9b
> [<ffffffff81040ec0>] SyS_exit_group+0x12/0x16
> [<ffffffff814f52d2>] system_call_fastpath+0x16/0x1b
> Code: 51 20 48 8d 41 20 48 39 c2 75 0d 48 c7 87 28 03 00 00 00 00 00 00
> eb 36 48 8d 42 e0 48 89 87 28 03 00 00 48 8b 51 20 48 8b 41 28 <48> 89
> 42 08 48 89 10 48 ba 00 01 10 00 00 00 ad de 48 b8 00 02
> RIP [<ffffffff810e3eef>] pgtable_trans_huge_withdraw+0x38/0x60
> RSP <ffff880fc0c5bc88>
> ---[ end trace e5413b388b6ea448 ]---
> Fixing recursive fault but reboot is needed!
> general protection fault: 0000 [#2] SMP
> Modules linked in:
> CPU: 268 PID: 1722 Comm: kworker/268:1 Tainted: G D
> 3.11.0-medusa-03121-g757f8ca #184
> Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS
> 01/15/2013
> Workqueue: events vmstat_update
> task: ffff880fc1a74280 ti: ffff880fc1a76000 task.ti: ffff880fc1a76000
> RIP: 0010:[<ffffffff810bcdcb>] [<ffffffff810bcdcb>]
> free_pcppages_bulk+0x97/0x329
> RSP: 0018:ffff880fc1a77c98 EFLAGS: 00010082
> RAX: ffff880fffd94d68 RBX: dead0000002001e0 RCX: ffff880fffd94d50
> RDX: ffff880fffd94d68 RSI: 000000000000001f RDI: ffff888f7efdac68
> RBP: ffff880fc1a77cf8 R08: 0000000000000400 R09: ffffffff81a8bf00
> R10: ffff884f7efdac00 R11: ffffffff81009bae R12: dead000000200200
> R13: ffff888f7efdac00 R14: 000000000000001f R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff880fffd80000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffff768b0b8 CR3: 0000000001a0b000 CR4: 00000000000407e0
> Stack:
> ffff880fc1a77ce8 ffff880fffd94d68 0000000000000010 ffff880fffd94d50
> 0000001ff9276a68 ffff880fffd94d60 0000000000000000 000000000000001f
> ffff880fffd94d50 0000000000000292 ffff880fc1a77d38 ffff880fffd95d05
> Call Trace:
> [<ffffffff810bd149>] drain_zone_pages+0x33/0x42
> [<ffffffff810cd5a6>] refresh_cpu_vm_stats+0xcc/0x11e
> [<ffffffff810cd609>] vmstat_update+0x11/0x43
> [<ffffffff8105350f>] process_one_work+0x260/0x389
> [<ffffffff8105381a>] worker_thread+0x1e2/0x332
> [<ffffffff81053638>] ? process_one_work+0x389/0x389
> [<ffffffff810579df>] kthread+0xb3/0xbd
> [<ffffffff81053638>] ? process_one_work+0x389/0x389
> [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
> [<ffffffff814f522c>] ret_from_fork+0x7c/0xb0
> [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
> Code: 48 89 55 c8 48 39 14 08 74 ce 41 83 fe 03 44 0f 44 75 c4 48 83 c2
> 08 48 89 45 b0 48 89 55 a8 48 8b 45 a8 4c 8b 20 49 8d 5c 24 e0 <48> 8b
> 53 20 48 8b 43 28 48 89 42 08 48 89 10 48 ba 00 01 10 00
> RIP [<ffffffff810bcdcb>] free_pcppages_bulk+0x97/0x329
> RSP <ffff880fc1a77c98>
> ---[ end trace e5413b388b6ea449 ]---
> BUG: unable to handle kernel paging request at ffffffffffffffd8
> IP: [<ffffffff8105742c>] kthread_data+0xb/0x11
> PGD 1a0c067 PUD 1a0e067 PMD 0
> Oops: 0000 [#3] SMP
> Modules linked in:
> CPU: 268 PID: 1722 Comm: kworker/268:1 Tainted: G D
> 3.11.0-medusa-03121-g757f8ca #184
> Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS
> 01/15/2013
> task: ffff880fc1a74280 ti: ffff880fc1a76000 task.ti: ffff880fc1a76000
> RIP: 0010:[<ffffffff8105742c>] [<ffffffff8105742c>]
> kthread_data+0xb/0x11
> RSP: 0018:ffff880fc1a77948 EFLAGS: 00010092
> RAX: 0000000000000000 RBX: 000000000000010c RCX: 0000000000000000
> RDX: 000000000000000f RSI: 000000000000010c RDI: ffff880fc1a74280
> RBP: ffff880fc1a77948 R08: 00000000000442c8 R09: 0000000000000000
> R10: dead000000200200 R11: ffff880fc1a742e8 R12: ffff880fc1a74868
> R13: ffff880fffd91cc0 R14: ffff880ff9b7a040 R15: 000000000000010c
> FS: 0000000000000000(0000) GS:ffff880fffd80000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000028 CR3: 0000000001a0b000 CR4: 00000000000407e0
> Stack:
> ffff880fc1a77968 ffffffff8105151f ffff880fc1a77968 ffff880fc1a74280
> ffff880fc1a77ab8 ffffffff814f2e98 ffff880fc1a76010 0000000000004000
> ffff880fc1a74280 0000000000011cc0 ffff880fc1a77fd8 ffff880fc1a77fd8
> Call Trace:
> [<ffffffff8105151f>] wq_worker_sleeping+0x10/0x82
> [<ffffffff814f2e98>] __schedule+0x1b7/0x8f7
> [<ffffffff8135d4bd>] ? mix_pool_bytes+0x4a/0x56
> [<ffffffff810a5d05>] ? call_rcu_sched+0x16/0x18
> [<ffffffff8103f708>] ? release_task+0x3a7/0x3bf
> [<ffffffff814f36b5>] schedule+0x61/0x63
> [<ffffffff81040e0f>] do_exit+0x903/0x907
> [<ffffffff8100529a>] oops_end+0xb9/0xc1
> [<ffffffff81005393>] die+0x55/0x5e
> [<ffffffff8100341a>] do_general_protection+0x93/0x139
> [<ffffffff814f4d82>] general_protection+0x22/0x30
> [<ffffffff81009bae>] ? default_idle+0x6/0x8
> [<ffffffff810bcdcb>] ? free_pcppages_bulk+0x97/0x329
> [<ffffffff810bcd5d>] ? free_pcppages_bulk+0x29/0x329
> [<ffffffff810bd149>] drain_zone_pages+0x33/0x42
> [<ffffffff810cd5a6>] refresh_cpu_vm_stats+0xcc/0x11e
> [<ffffffff810cd609>] vmstat_update+0x11/0x43
> [<ffffffff8105350f>] process_one_work+0x260/0x389
> [<ffffffff8105381a>] worker_thread+0x1e2/0x332
> [<ffffffff81053638>] ? process_one_work+0x389/0x389
> [<ffffffff810579df>] kthread+0xb3/0xbd
> [<ffffffff81053638>] ? process_one_work+0x389/0x389
> [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
> [<ffffffff814f522c>] ret_from_fork+0x7c/0xb0
> [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b
> Code: 65 48 8b 04 25 40 b7 00 00 48 8b 80 90 05 00 00 48 89 e5 48 8b 40
> c8 c9 48 c1 e8 02 83 e0 01 c3 48 8b 87 90 05 00 00 55 48 89 e5 <48> 8b
> 40 d8 c9 c3 48 3b 3d 67 ca c2 00 55 48 89 e5 75 09 0f bf
> RIP [<ffffffff8105742c>] kthread_data+0xb/0x11
> RSP <ffff880fc1a77948>
> CR2: ffffffffffffffd8
> ---[ end trace e5413b388b6ea44a ]---
> Fixing recursive fault but reboot is needed!
>
> I'm testing on a 528 core machine, with ~2TB of memory, THP on. The
> test case works like this:
>
> - Spawn 512 threads using pthread_create, pin each thread to a separate
> cpu
> - Each thread allocates 512mb, local to its cpu
> - Threads are sent a "go" signal, all threads begin touching the first
> byte of each 4k chunk of their 512mb simultaneously
>
> I'm working on debugging the issue now, but I thought I'd get this out
> to everyone in case they might have some input. I'll try and get my
> test program cleaned up and posted somewhere today so that others can
> try it out as well.

Thanks. Please let me know when it's available. I'll look at it.

Naoya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/