Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

From: Peter Hurley
Date: Fri Feb 26 2016 - 13:18:29 EST


On 02/26/2016 10:05 AM, Linus Torvalds wrote:
> On Fri, Feb 26, 2016 at 9:52 AM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> So more analysis would seem to confirm that RSP has been bumped +8
>> while in ttwu_stat() so when the epilog executed, register restore
>> was off by 1 qword. However, there's nothing in ttwu_stat() that
>> results in stack pointer offset by +1 qword from prolog.
>
> I agree.
>
> That's why I'm actually starting to suspect that it's an AMD microcode
> bug that we know very little about. There's apparently register
> corruption (the guess being from NMI handling, but virtualization was
> also involved) under some circumstances.

Yep, that could explain it.

> Of course, if Jiri isn't actually running this on an AMD CPU, that
> theory flies right out the window.

I'll wait for Jiri to confirm before sinking more time here.


> But we do have a reported oops on
> the security list that looks totally different in the big picture, but
> shares the exact same "corrupted stack pointer register state
> resulting in crazy instruction pointer, resulting in NX fault"
> behavior in the end.
>
> In the other case, microcode patchlevel 0x0600081c was fine, and
> 0x06000832 is the one exhibiting the corruption problem.
>
> I've contacted Robert ÅwiÄcki (who found the microcode problem) in
> case he wants to weigh in in this thread.. He was talking to some AMD
> people, but I don't know the exactly who.

Ok, thanks for the info.