Re: bio linked list corruption.

From: Dave Jones
Date: Wed Nov 23 2016 - 14:34:36 EST


On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote:
> On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote:
> >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39
> >> page:ffffea0013838e40 count:0 mapcount:0 mapping:ffff8804a20310e0 index:0x100c
> >> flags: 0x400000000000000c(referenced|uptodate)
> >> page dumped because: non-NULL mapping
> >
> >Hmm. So this seems to be btrfs-specific, right?
> >
> >I searched for all your "non-NULL mapping" cases, and they all seem to
> >have basically the same call trace, with some work thread doing
> >writeback and going through btrfs_writepages().
> >
> >Sounds like it's a race with either fallocate hole-punching or
> >truncate. I'm not seeing it, but I suspect it's btrfs, since DaveJ
> >clearly ran other filesystems too but I am not seeing this backtrace
> >for anything else.
>
> Agreed, I think this is a separate bug, almost certainly btrfs specific.
> I'll work with Dave on a better reproducer.

Ok, let's try this..
Here's the usual dump, from the current tree:


[ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
[ 317.702235] page:ffffea001363f500 count:0 mapcount:0 mapping:ffff8804eef8e0e0 index:0x29
[ 317.718068] flags: 0x400000000000000c(referenced|uptodate)
[ 317.731086] page dumped because: non-NULL mapping
[ 317.744118] CPU: 3 PID: 1558 Comm: kworker/u8:8 Not tainted 4.9.0-rc6-think+ #1
[ 317.770404] Workqueue: writeback wb_workfn
[ 317.783490] (flush-btrfs-3)
[ 317.783581] ffffc90000ff3798
[ 317.796651] ffffffff813b69bc
[ 317.796742] ffffea001363f500
[ 317.796798] ffffffff81c474ee
[ 317.796854] ffffc90000ff37c0
[ 317.809910] ffffffff811b30c4
[ 317.810001] 0000000000000000
[ 317.810055] ffffea001363f500
[ 317.810112] 0000000000000003
[ 317.823230] ffffc90000ff37d0
[ 317.823320] ffffffff811b318f
[ 317.823376] ffffc90000ff3818
[ 317.823431] Call Trace:
[ 317.836645] [<ffffffff813b69bc>] dump_stack+0x4f/0x73
[ 317.850063] [<ffffffff811b30c4>] bad_page+0xc4/0x130
[ 317.863426] [<ffffffff811b318f>] free_pages_check_bad+0x5f/0x70
[ 317.876805] [<ffffffff811b61f5>] free_hot_cold_page+0x305/0x3b0
[ 317.890132] [<ffffffff811b660e>] free_hot_cold_page_list+0x7e/0x170
[ 317.903410] [<ffffffff811c08b7>] release_pages+0x2e7/0x3a0
[ 317.916600] [<ffffffff811c2067>] __pagevec_release+0x27/0x40
[ 317.929817] [<ffffffffa00b0a55>] extent_write_cache_pages.isra.44.constprop.59+0x355/0x430 [btrfs]
[ 317.943073] [<ffffffffa00b0fcd>] extent_writepages+0x5d/0x90 [btrfs]
[ 317.956033] [<ffffffffa008ea60>] ? btrfs_releasepage+0x40/0x40 [btrfs]
[ 317.968902] [<ffffffffa008b6a8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 317.981744] [<ffffffff811be6d1>] do_writepages+0x21/0x30
[ 317.994524] [<ffffffff812727cf>] __writeback_single_inode+0x7f/0x6e0
[ 318.007333] [<ffffffff81894cc1>] ? _raw_spin_unlock+0x31/0x50
[ 318.020090] [<ffffffff8127356d>] writeback_sb_inodes+0x31d/0x6c0
[ 318.032844] [<ffffffff812739a2>] __writeback_inodes_wb+0x92/0xc0
[ 318.045623] [<ffffffff81273e97>] wb_writeback+0x3f7/0x530
[ 318.058304] [<ffffffff81274a38>] wb_workfn+0x148/0x620
[ 318.070949] [<ffffffff810a4b54>] ? process_one_work+0x194/0x690
[ 318.083549] [<ffffffff812748f5>] ? wb_workfn+0x5/0x620
[ 318.096160] [<ffffffff810a4bf2>] process_one_work+0x232/0x690
[ 318.108722] [<ffffffff810a4b54>] ? process_one_work+0x194/0x690
[ 318.121225] [<ffffffff810a509e>] worker_thread+0x4e/0x490
[ 318.133680] [<ffffffff810a5050>] ? process_one_work+0x690/0x690
[ 318.146126] [<ffffffff810a5050>] ? process_one_work+0x690/0x690
[ 318.158507] [<ffffffff810aa7e2>] kthread+0x102/0x120
[ 318.170880] [<ffffffff810aa6e0>] ? kthread_park+0x60/0x60
[ 318.183254] [<ffffffff81895822>] ret_from_fork+0x22/0x30


This time, I managed to get my ftrace magic working, so we have this
trace from just before this happened. Does this shed any light ?

https://codemonkey.org.uk/junk/trace.txt

(note that some of the spinlock/rcu calls will be missing here. I chose
to add them to ftrace's exclude list because they dominate the trace
buffer so much, especially in debug builds)

Dave