Re: [PATCH 5/9] HWPoison: add memory_failure_queue()

From: huang ying
Date: Sun May 22 2011 - 04:14:38 EST


On Fri, May 20, 2011 at 7:56 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Huang Ying <ying.huang@xxxxxxxxx> wrote:
>
>> > So why are we not working towards integrating this into our event
>> > reporting/handling framework, as i suggested it from day one on when you
>> > started posting these patches?
>>
>> The memory_failure_queue() introduced in this patch is general, that is, it
>> can be used not only by ACPI/APEI, but also any other hardware error
>> handlers, including your event reporting/handling framework.
>
> Well, the bit you are steadfastly ignoring is what i have made clear well
> before you started adding these facilities: THEY ALREADY EXISTS to a large
> degree :-)
>
> So you were and are duplicating code instead of using and extending existing
> event processing facilities. It does not matter one little bit that the code
> you added is partly 'generic', it's still overlapping and duplicated.

How to do hardware error recovering in your perf framework? IMHO, it
can be something as follow:

- NMI handler run for the hardware error, where hardware error
information is collected and put into a ring buffer, an irq_work is
triggered for further work
- In irq_work handler, memory_failure_queue() is called to do the real
recovering work for recoverable memory error in ring buffer.

What's your idea about hardware error recovering in perf?

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/