Re: [PATCH v2 0/3] support for broken memory modules (BadRAM)

From: Stefan Assmann
Date: Wed Jun 22 2011 - 16:25:26 EST

On 22.06.2011 20:00, Andrew Morton wrote:
> On Wed, 22 Jun 2011 13:18:51 +0200 Stefan Assmann <sassmann@xxxxxxxxx> wrote:


>> The idea is to allow the user to specify RAM addresses that shouldn't be
>> touched by the OS, because they are broken in some way. Not all machines have
>> hardware support for hwpoison, ECC RAM, etc, so here's a solution that allows to
>> use bitmasks to mask address patterns with the new "badram" kernel command line
>> parameter.
>> Memtest86 has an option to generate these patterns since v2.3 so the only thing
>> for the user to do should be:
>> - run Memtest86
>> - note down the pattern
>> - add badram=<pattern> to the kernel command line
>> The concerning pages are then marked with the hwpoison flag and thus won't be
>> used by the memory managment system.
> The google kernel has a similar capability. I asked Nancy to comment
> on these patches and she said:

This is the first time I hear about this feature from Google. If I had
known about it I sure would have talked to the person responsible.

> : One, the bad addresses are passed via the kernel command line, which
> : has a limited length. It's okay if the addresses can be fit into a
> : pattern, but that's not necessarily the case in the google kernel. And
> : even with patterns, the limit on the command line length limits the
> : number of patterns that user can specify. Instead we use lilo to pass
> : a file containing the bad pages in e820 format to the kernel.

I see no reason why there couldn't be multiple ways of specifying bad

> :
> : Second, the BadRAM patch expands the address patterns from the command
> : line into individual entries in the kernel's e820 table. The e820
> : table is a fixed buffer that supports a very small, hard coded number
> : of entries (128). We require a much larger number of entries (on
> : the order of a few thousand), so much of the google kernel patch deals
> : with expanding the e820 table. Also, with the BadRAM patch, entries
> : that don't fit in the table are silently dropped and this isn't
> : appropriate for us.

So far the use case I had in mind wasn't "thousands of entries". However
expanding the e820 table is probably an issue that could be dealt with
separately ?

> :
> : Another caveat of mapping out too much bad memory in general. If too
> : much memory is removed from low memory, a system may not boot. We
> : solve this by generating good maps. Our userspace tools do not map out
> : memory below a certain limit, and it verifies against a system's iomap
> : that only addresses from memory is mapped out.

Well if too much low memory is bad, you're screwed anyway, not? :)

> I have a couple of thoughts here:
> - If this patchset is merged and a major user such as google is
> unable to use it and has to continue to carry a separate patch then
> that's a regrettable situation for the upstream kernel.

I'm all ears for making things work out for potential users, I just
didn't know.

> - Google's is, afaik, the largest use case we know of: zillions of
> machines for a number of years. And this real-world experience tells
> us that the badram patchset has shortcomings. Shortcomings which we
> can expect other users to experience.
> So. What are your thoughts on these issues?

I'm aware that the implementation I posted is not covering *everything*.
It's a start and I tried to keep it simple and make use of already
existing infrastructure.
At the moment I don't see any arguments why this patchset couldn't play
along nicely or get enhanced to support what Google needs, but I don't
know Googles patches yet.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at