Re: [PATCH V7 05/10] acpi: apei: handle SEA notification type for ARMv8

From: Baicar, Tyler
Date: Fri Jan 20 2017 - 15:59:34 EST


On 1/19/2017 10:57 AM, James Morse wrote:
Hi Tyler,

On 18/01/17 23:51, Baicar, Tyler wrote:
On 1/18/2017 7:50 AM, James Morse wrote:
On 12/01/17 18:15, Tyler Baicar wrote:
ARM APEI extension proposal added SEA (Synchrounous External
Abort) notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 2acbc60..87efe26 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -767,6 +772,62 @@ static struct notifier_block ghes_notifier_sci = {
.notifier_call = ghes_notify_sci,
};
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+static int ghes_notify_sea(struct notifier_block *this,
+ unsigned long event, void *data)
+{
+ struct ghes *ghes;
+ int ret = NOTIFY_DONE;
+
+ nmi_enter();
Can we move this into the arch code? Its because we got here from a
synchronous-exception that makes this nmi-like, I think it only makes sense for
it be called from under /arch/.
So move the nmi_enter/exit calls into do_sea of the previous patch? I can do
that in the next patchset.
Where did the rcu_read_lock() go? I can see its missing from ghes_notify_nmi()
too, but I don't know enough about RCU to know if that's safe!

The second paragraph in the comment above rcu_read_lock() describes it as
preventing call_rcu() during a read-side critical section that was running
concurrently. Doesn't this mean we can race with ghes_sea_remove() on another
CPU because we wait for the wrong grace period?

The same comment talks about how these read-side critical sections can nest, so
I think its quite safe to make these 'lock' calls here.
Sorry, I thought we wanted nmi_enter/exit instead of the rcu_read_lock/unlock. I
guess the rcu locks
will not cause the deadlock scenario you described in the previous patchset if
we have the
nmi_enter/exit wrapped around the rcu critical section.
Ah, not instead of, (well, not initially!).
The nmi_enter()/nmi_exit() thing was to fix the APEI interrupting APEI problem.
This is only a problem for notification types which can interrupt
interrupts-masked code, of which SEA is one. (and x86's NMI is the other).

I think I've found the answer to why the rcu_read_lock() isn't needed.
synchronize_sched() has:
* This means that all preempt_disable code sequences, including NMI and
* non-threaded hardware-interrupt handlers, in progress on entry will
* have completed before this primitive returns.
synchronize_rcu() has the same innards, so I'm convinced this its safe not to
have those calls in here. Could we have a comment along the lines of:
synchronize_rcu() will wait for nmi_exit(), so no need to rcu_read_lock().
Okay, I'll add the comment in the next patchset.
(The more I learn about RCU the scarier it becomes!)


There are two other things that need changing to make the in_nmi() code path
work on arm64.
Always reserve the virtual-address-space forcing GHES_IOREMAP_PAGES to be 2
regardless of CONFIG_HAVE_ACPI_APEI_NMI. This is almost revert of
594c7255dce7a13cac50cf2470cc56e2c3b0494e (but that did a few other things too).
Looks simple enough, should I force it to 2 in all cases, or add a check for CONFIG_HAVE_ACPI_APEI_SEA
similar to the check for CONFIG_HAVE_ACPI_APEI_NMI?
We also need to fix ghes_ioremap_pfn_nmi() to use arch_apei_get_mem_attribute()
and not assume PAGE_KERNEL.
So just change the call to ioremap_page_range to:

ioremap_page_range(vaddr, vaddr + PAGE_SIZE, pfn << PAGE_SHIFT, arch_apei_get_mem_attribute());

Thanks,
Tyler

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.