[RFC PATCHSET 0/12] RAS daemon v4

From: Borislav Petkov
Date: Fri Jan 21 2011 - 10:10:38 EST


From: Borislav Petkov <borislav.petkov@xxxxxxx>

Hi,

here's another round of the RAS daemon patchset. This time I'd like to
get some ACKs/NACKs on the perf bits and whether this is agreeable to
do. To some of the patches:

* 0001-perf-Start-the-massive-restructuring.patch:

This renames perf_event.c into events/core.c, as talked about earlier.
This is only a first step though, the rest should come from perf people
I guess...

* 0002-perf-Add-persistent-event-facilities.patch

... and this one puts the persistent bits in persistent.c

* 0004-perf-Add-Makefile.lib.patch
* 0005-perf-Export-trace-event-utils.patch

I'm adding a toplevel tools/Makefile here which we could use for the
other tools in there since we keep growing even more tools with each
kernel release.

* 0007-perf-Export-debugfs-utilities.patch

This one is needed only temporary, as we're moving the perf events to
/sysfs. After that work is done, the persistent fd will be read from
there.

For more details, check the individual patches.

Btw, the patches are ontop of tip/master from ~two weeks ago, i.e.:
cf1f6cd677a9ce8c80e5de61724a25074ad9a8cf.

In order to run this patchset, you need only this hunk:

---
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index c018109..7bffbc6 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1,5 +1,6 @@
#include <linux/module.h>
#include <linux/slab.h>
+#include <trace/events/mce.h>

#include "mce_amd.h"

@@ -598,6 +599,8 @@ int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)

amd_decode_err_code(m->status & 0xffff);

+ trace_mce_record(m);
+
return NOTIFY_STOP;
}
EXPORT_SYMBOL_GPL(amd_decode_mce);
---

so that you can inject some MCEs like so:

$ cd tools/
$ make -j ras
$ ./ras/rasd
$ modprobe mce_amd_inj (built by CONFIG_EDAC_MCE_INJ)
$ echo 0x9c00410000010016 > /sys/devices/system/edac/mce/status
$ echo 0 > /sys/devices/system/edac/mce/bank

And after 30 sec the latest, /var/log/ras.log will contain:

Got MCE, cpu: 0, status: 0x9c00410000010016, addr: 0x0000000000000000

This is still undecoded yet but I'm working on it.

Anyway, please take a look and send me all comments you'd have.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/