Re: [PATCH 2/2] powerpc/powernv: Wire up OPAL address lookups

From: Nicholas Piggin
Date: Wed Mar 04 2020 - 22:43:38 EST


Michael Ellerman's on March 3, 2020 9:43 pm:
> Nicholas Piggin <npiggin@xxxxxxxxx> writes:
>> Use ARCH_HAS_ADDRESS_LOOKUP to look up the opal symbol table. This
>> allows crashes and xmon debugging to print firmware symbols.
>>
>> Oops: System Reset, sig: 6 [#1]
>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
>> Modules linked in:
>> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc2-dirty #903
>> NIP: 0000000030020434 LR: 000000003000378c CTR: 0000000030020414
>> REGS: c0000000fffc3d70 TRAP: 0100 Not tainted (5.6.0-rc2-dirty)
>> MSR: 9000000002101002 <SF,HV,VEC,ME,RI> CR: 28022284 XER: 20040000
>> CFAR: 0000000030003788 IRQMASK: 3
>> GPR00: 000000003000378c 0000000031c13c90 0000000030136200 c0000000012cfa10
>> GPR04: c0000000012cfa10 0000000000000010 0000000000000000 0000000031c10060
>> GPR08: c0000000012cfaaf 0000000030003640 0000000000000000 0000000000000001
>> GPR12: 00000000300e0000 c000000001490000 0000000000000000 c00000000139c588
>> GPR16: 0000000031c10000 c00000000125a900 0000000000000000 c0000000012076a8
>> GPR20: c0000000012a3950 0000000000000001 0000000031c10060 c0000000012cfaaf
>> GPR24: 0000000000000019 0000000030003640 0000000000000000 0000000000000000
>> GPR28: 0000000000000010 c0000000012cfa10 0000000000000000 0000000000000000
>> NIP [0000000030020434] .dummy_console_write_buffer_space+0x20/0x64 [OPAL]
>> LR [000000003000378c] opal_entry+0x14c/0x17c [OPAL]
>>
>> This won't unwind the firmware stack (or its Linux caller) properly if
>> firmware and kernel endians don't match, but that problem could be solved
>> in powerpc's unwinder.
>
> How well does this work if we're tracing opal calls at the time we oops :)
>
> Though it looks like that's already fishy because we don't do anything
> to disable tracing of opal_console_write().

Yeah we don't do perfectly well in this case still. OPAL itself has
locks in its console paths and some issues with stack reentrancy.
We should do a bit better with cutting out more junk including tracing
from crash paths, so this doesn't fundamentally make things harder.

> I guess I'm a bit wary of adding numerous further opal calls in the oops
> path, I'm sure the opal symbol lookup code is bug free, but still.

There's a few, console write, event poll, reboot, and NMI IPI AFAIK,
so we have to make the opal call path itself robust (it's getting
there).

> Could we instead suck in the opal symbols early on, and search them in
> Linux? I suspect you've thought of that and rejected it, but it would be
> good to document why.

We could, I was thinking we might want OPAL to do something special
with them like add module annotations [OPAL] vs [HOMER] or whatever,
relocate itself after boot if we randomize where it's loaded etc.
but perhaps none of those things really prevent the symbols being
discovered at boot time. I don't know, it was easier? :)

>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>> index c1f25a760eb1..c3a2a797177a 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -214,7 +214,11 @@
>> #define OPAL_SECVAR_GET 176
>> #define OPAL_SECVAR_GET_NEXT 177
>> #define OPAL_SECVAR_ENQUEUE_UPDATE 178
>> -#define OPAL_LAST 178
>> +#define OPAL_PHB_SET_OPTION 179
>> +#define OPAL_PHB_GET_OPTION 180
>
> Only pull in the calls you need for this patch.

Ah okay I didn't realise that was the policy, makes sense.

Thanks,
Nick