Re: bio linked list corruption.

From: Vegard Nossum
Date: Mon Dec 05 2016 - 13:26:16 EST


On 5 December 2016 at 19:11, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
>> On 23 November 2016 at 20:58, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
>>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>>
>>> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
>>> > trace from just before this happened. Does this shed any light ?
>>> >
>>> > https://codemonkey.org.uk/junk/trace.txt
>>>
>>> crap, I just noticed the timestamps in the trace come from quite a bit
>>> later. I'll tweak the code to do the taint checking/ftrace stop after
>>> every syscall, that should narrow the window some more.
>>
>> FWIW I hit this as well:
>>
>> BUG: unable to handle kernel paging request at ffffffff81ff08b7
>
> We really ought to improve this message. If nothing else, it should
> say whether it was a read, a write, or an instruction fetch.
>
>> IP: [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30
>> PGD 461e067 PUD 461f063
>> PMD 1e001e1
>
> Too lazy to manually decode this right now, but I don't think it matters.
>
>> Oops: 0003 [#1] PREEMPT SMP KASAN
>
> Note this is SMP, but that just means CONFIG_SMP=y. Vegard, how many
> CPUs does your kernel think you have?

My first crash was running on a 1-CPU guest (not intentionally, but
because of a badly configured qemu). I'm honestly surprised it
triggered at all with 1 CPU, but I guess it shows that it's not a true
concurrency issue at least!

>
>> RIP: 0010:[<ffffffff8135f2ea>] [<ffffffff8135f2ea>]
>> __lock_acquire.isra.32+0xda/0x1a30
>> RSP: 0018:ffff8801bab8f730 EFLAGS: 00010082
>> RAX: ffffffff81ff071f RBX: 0000000000000000 RCX: 0000000000000000
>
> RAX points to kernel text.

Yes, it's somewhere in the middle of iov_iter_init() -- other crashes
also had put_prev_entity(), a null pointer, and some garbage values I
couldn't identify.

>
>> Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85
>> c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <3e>
>> ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00
>
> 2b: 3e ff 80 98 01 00 00 incl %ds:*0x198(%rax) <--
> trapping instruction
>
> That's very strange. I think this is:
>
> atomic_inc((atomic_t *)&class->ops);
>
> but my kernel contains:
>
> 3cb4: f0 ff 80 98 01 00 00 lock incl 0x198(%rax)
>
> So your kernel has been smp-alternatived. That 3e comes from
> alternatives_smp_unlock. If you're running on SMP with UP
> alternatives, things will break.

Yes, indeed. It was running on 1 CPU by mistake and still triggered the bug.

The crashes started really pouring in once I got my qemu fixed. Just
to reassure you, here's another crash which shows it's using the
correct instruction on an actual multicore guest:

BUG: unable to handle kernel paging request at 00000003030001de
IP: [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30
PGD 183fd2067 PUD 0

Oops: 0002 [#1] PREEMPT SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
CPU: 23 PID: 9584 Comm: trinity-c104 Not tainted 4.9.0-rc7+ #219
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff880189f68000 task.stack: ffff88017fe50000
RIP: 0010:[<ffffffff8135f2ea>] [<ffffffff8135f2ea>]
__lock_acquire.isra.32+0xda/0x1a30
RSP: 0018:ffff88017fe575e0 EFLAGS: 00010002
RAX: 0000000303000046 RBX: 0000000000000000 RCX: 0000000000000000
[...]
Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85
c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <f0>
ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00
RIP [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30
RSP <ffff88017fe575e0>
CR2: 00000003030001de
---[ end trace 2846425104eb6141 ]---
Kernel panic - not syncing: Fatal exception
------------[ cut here ]------------

2b:* f0 ff 80 98 01 00 00 lock incl 0x198(%rax) <--
trapping instruction

> What's your kernel command line? Can we have your entire kernel log from boot?

Just in case you still want this, I've attached the boot log for the
"true SMP" guest above.


Vegard

Attachment: 5.txt.gz
Description: GNU Zip compressed data