Re: bio linked list corruption.

From: Andy Lutomirski
Date: Mon Dec 05 2016 - 13:11:55 EST


On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> On 23 November 2016 at 20:58, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote:
>>
>> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4
>> > trace from just before this happened. Does this shed any light ?
>> >
>> > https://codemonkey.org.uk/junk/trace.txt
>>
>> crap, I just noticed the timestamps in the trace come from quite a bit
>> later. I'll tweak the code to do the taint checking/ftrace stop after
>> every syscall, that should narrow the window some more.
>
> FWIW I hit this as well:
>
> BUG: unable to handle kernel paging request at ffffffff81ff08b7

We really ought to improve this message. If nothing else, it should
say whether it was a read, a write, or an instruction fetch.

> IP: [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30
> PGD 461e067 PUD 461f063
> PMD 1e001e1

Too lazy to manually decode this right now, but I don't think it matters.

> Oops: 0003 [#1] PREEMPT SMP KASAN

Note this is SMP, but that just means CONFIG_SMP=y. Vegard, how many
CPUs does your kernel think you have?

> RIP: 0010:[<ffffffff8135f2ea>] [<ffffffff8135f2ea>]
> __lock_acquire.isra.32+0xda/0x1a30
> RSP: 0018:ffff8801bab8f730 EFLAGS: 00010082
> RAX: ffffffff81ff071f RBX: 0000000000000000 RCX: 0000000000000000

RAX points to kernel text.

> Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85
> c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <3e>
> ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00

2b: 3e ff 80 98 01 00 00 incl %ds:*0x198(%rax) <--
trapping instruction

That's very strange. I think this is:

atomic_inc((atomic_t *)&class->ops);

but my kernel contains:

3cb4: f0 ff 80 98 01 00 00 lock incl 0x198(%rax)

So your kernel has been smp-alternatived. That 3e comes from
alternatives_smp_unlock. If you're running on SMP with UP
alternatives, things will break.

What's your kernel command line? Can we have your entire kernel log from boot?

Adding Borislav, since he's the guru for this code.