pstore dump inside an nmi handler

From: Don Zickus
Date: Fri Jul 08 2011 - 16:17:40 EST


Hi Tony,

I was playing with the APEI EINJ module, injecting errors trying to
capture a GHES record, then panic into a kdump kernel and reboot.

Matthew brought to my attention that pstore should capture an error record
on the panic path using kmsg_dump(). After injecting an error with EINJ,
I went to check to see if there was a pstore entry. There wasn't.

Playing on another box, I noticed the machine double faulted and didn't
even make it into a kdump kernel.

Upon investigation, I noticed that when a fatal error occurs on the
platform, it will generate an NMI that will be handle by the
ghes_nmi_handler. This handler calls panic() which calls kmsg_dump()
which calls pstore_dump().

Inside pstore_dump(), the first thing it tries to grab is a mutex_lock()
(inside an nmi hander). This seems to be the root cause of my problems.

I am not familiar enough with pstore to just modify its locking, so I
wanted to ask you.

My first thought was to wrap the mutex_lock with a 'if !in_nmi()', but that
seemed kinda hacky. Then I was wondering if there was a way to do this
locklessly or atomically because you are only dealing with whole blocks I
think. I don't know.

Wanted to give you a heads up and seek your thoughts. I am willing to
hack up some code and test. :-)

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/