Re: [RFC Patch 0/2] mm: Add parameters to make kernel behavior atmemory error on dirty cache selectable

From: Mitsuhiro Tanino
Date: Fri Apr 12 2013 - 09:44:01 EST


(2013/04/11 22:00), Ric Mason wrote:
> Hi Mitsuhiro,
> On 04/11/2013 08:51 PM, Mitsuhiro Tanino wrote:
>> (2013/04/11 12:53), Simon Jeons wrote:
>>> One question against mce instead of the patchset. ;-)
>>>
>>> When check memory is bad? Before memory access? Is there a process scan it period?
>> Hi Simon-san,
>>
>> Yes, there is a process to scan memory periodically.
>>
>> At Intel Nehalem-EX and CPUs after Nehalem-EX generation, MCA recovery
>> is supported. MCA recovery provides error detection and isolation
>> features to work together with OS.
>> One of the MCA Recovery features is Memory Scrubbing. It periodically
>> checks memory in the background of OS.
>
> Memory Scrubbing is a kernel thread? Where is the codes of memory scrubbing?

Hi Ric,

No. One of the MCA Recovery features is Memory Scrubbing.
And Memory Scrubbing is a hardware feature of Intel CPU.

OS has a hwpoison feature which is included at mm/memory-failure.c.
A main function is memory_failure().

If Memory Scrubbing finds a memory error, MCA recovery notifies SRAO error
into OS and OS handles the SRAO error using hwpoison function.


>> If Memory Scrubbing find an uncorrectable error on a memory before
>> OS accesses the memory bit, MCA recovery notifies SRAO error into OS
>
> It maybe can't find memory error timely since it is sleeping when memory error occur, can this case happened?

Memory Scrubbing seems to be operated periodically but I don't have
information about how oftern it is executed.

Regards,
Mitsuhiro Tanino

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/