Re: [RFC PATCH 0/3] Machine check recovery when kernel accesses poison

From: Borislav Petkov
Date: Wed Nov 11 2015 - 15:42:09 EST


On Tue, Nov 10, 2015 at 01:55:46PM -0800, Luck, Tony wrote:
> I need to add more to the motivation part of this. The people who want
> this are playing with NVDIMMs as storage. So think of many GBytes of
> non-volatile memory on the source end of the memcpy(). People are used
> to disk errors just giving them a -EIO error. They'll be unhappy if an
> NVDIMM error crashes the machine.

Ah.

Btw, there's no flag, by chance, somewhere in the MCA regs bunch at
error time which says that the error is originating from NVDIMM? Because
if there were, this patchset is moot. :)

> It will be up to the caller to figure out what action to take. In
> the NVDIMM filessytem scenario outlined above the result may be -EIO
> for a data block ... something more drastic if we were reading metadata.
>
> When I get around to writing mcsafe_copy_from_user() the code might
> end up like:
>
> some_syscall_e_g_write(void __user *buf, size_t cnt)
> {
> u64 ret;
>
> ret = mcsafe_copy_from_user(kbuf, buf, cnt);
>
> if (ret & BIT(63)) {
> do some machine check thing ... e.g.
> send a SIGBUS to this process and return -EINTR
> This is where we use the address (after converting
> back to a user virtual address).
> } else if (ret) {
> user gave us a bad buffer: return -EFAULT
> } else {
> success!!!
> }
> }

Oh ok, so bit 63 doesn't leave the kernel. Then it's all fine,
nevermind.

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/