Re: [PATCH 5/9] HWPoison: add memory_failure_queue()

From: Ingo Molnar
Date: Mon May 23 2011 - 22:52:33 EST



* Huang Ying <ying.huang@xxxxxxxxx> wrote:

> >> - How to deal with ring-buffer overflow? For example, there is full of
> >> corrected memory error in ring-buffer, and now a recoverable memory error
> >> occurs but it can not be put into perf ring buffer because of ring-buffer
> >> overflow, how to deal with the recoverable memory error?
> >
> > The solution is to make it large enough. With *every* queueing solution there
> > will be some sort of queue size limit.
>
> Another solution could be:
>
> Create two ring-buffer. One is for logging and will be read by RAS
> daemon; the other is for recovering, the event record will be removed
> from the ring-buffer after all 'active filters' have been run on it.
> Even RAS daemon being restarted or hang, recoverable error can be taken
> cared of.

Well, filters will always be executed since they execute when the event is
inserted - not when it's extracted.

So if you worry about losing *filter* executions (and dependent policy action)
- there should be no loss there, ever.

But yes, the scheme you outline would work as well: a counting-only event with
a filter specified - this will do no buffering at all.

So ... to get the ball rolling in this area one of you guys active in RAS
should really try a first approximation for the active filter approach: add a
test-TRACE_EVENT() for the errors you are interested in and define a convenient
way to register policy action with post-filter events. This should work even
without having the 'active' portion defined at the ABI and filter-string level.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/