Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

From: Andy Lutomirski
Date: Mon Nov 17 2014 - 17:27:21 EST


On Mon, Nov 17, 2014 at 1:55 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>>> However, I'd like to be very sure this thing doesn't introduce any
>>> regressions to the MCA code. So even if Tony's testing passes, I'd like
>>> to be very conservative here and stress it more than usual. Because once
>>> this thing hits upstream and stuff starts breaking, it'll be a serious
>>> PITA reverting it.
>
> The test I left running on Friday was just running the stack-switch asm
> patch, without any mce.c changes. It died at 16000 iterations with the
> mce synchronization issue.

I still wonder whether the timeout code is the real culprit. My patch
will slow down entry into do_machine_check by tens of cycles, several
cachelines, and possibly a couple of TLB misses. Given that the
timing seemed marginal to me, it's possible (albeit not that likely)
that it pushed the time needed for synchronization into the range of
unreliability.

Any chance you can retry it at some point with that USEC_PER_SEC thing
changed to NSEC_PER_SEC and SPINUNIT set to something closer to 10
than 100?

--Andy

>
> This morning I started a new test with all the mce changes (no TIF_MCE_NOTIFY,
> just process the recovery in the tail of do_machine_check().
>
> It just passed the 18000 point, and it still going. In addition I've been throwing
> the odd "make -j144" kernel build at the machine so we check out the non-idle
> paths too.
>
> -Tony



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/