Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

From: Borislav Petkov
Date: Wed Apr 14 2021 - 09:05:43 EST


On Tue, Apr 13, 2021 at 04:13:03PM +0000, Luck, Tony wrote:
> Even if no applications ever do anything with it, it is still useful to avoid
> crashing the whole system and just terminate one application/guest.

True.

> There's one more item on my long term TODO list. Add fixups so that
> copy_to_user() from poison in the page cache doesn't crash, but just
> checks to see if the page was clean .. .in which case re-read from the
> filesystem into a different physical page and retire the old page ... the
> read can now succeed. If the page is dirty, then fail the read (and retire
> the page ... need to make sure filesystem knows the data for the page
> was lost so subsequent reads return -EIO or something).

Makes sense.

> Page cache occupies enough memory that it is a big enough
> source of system crashes that could be avoided. I'm not sure
> if there are any other obvious cases after this ... it all gets into
> diminishing returns ... not really worth it to handle a case that
> only occupies 0.00002% of memory.

Ack.

> See above. With core counts continuing to increase, the cloud service
> providers really want to see fewer events that crash the whole physical
> machine (taking down dozens, or hundreds, of guest VMs).

Yap.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette