Re: [PATCH] arm64: head: Fix cache inconsistency of the identity-mapped region

From: Shanker R Donthineni
Date: Thu May 05 2022 - 07:42:45 EST


Hi Will,

On 4/21/22 6:47 AM, Will Deacon wrote:
> External email: Use caution opening links or attachments
>
>
> Hi Shanker,
>
> First of all, thanks for continuing to share so many useful details here.
> I'm pretty confident we'll eventually get to the bottom of this.
>
> On Wed, Apr 20, 2022 at 10:02:13AM -0500, Shanker R Donthineni wrote:
>> On 4/20/22 5:08 AM, Will Deacon wrote:
>>> Non-coherent? You mean non-cacheable, right? At this stage, we only
>>>>> have a single CPU, so I'm not sure coherency is the problem here. When
>>>>> you say 'drop', is that an eviction? Replaced by what? By the previous
>>>>> version of the cache line, containing the stale value?
>>>> Yes,non-cacheable. The cache line corresponding to function enter_vhe() was
>>>> marked with shareable & WB-cache as a result of write to RELA, the same cache
>>>> line is being fetched with non-shareable & non-cacheable. The eviction is
>>>> not pushed to PoC and got dropped because of cache-line attributes mismatch.
>>> I'm really struggling to understand this. Why is the instruction fetch
>>> non-shareable? I'm trying to align your observations with the rules about
>>> mismatched aliases in the architecture and I'm yet to satisfy myself that
>>> the CPU is allowed to drop a dirty line on the floor in response to an
>>> unexpected hit.
>>>
>>> My mental model (which seems to align with Marc) is something like:
>>>
>>>
>>> 1. The boot CPU fetches the line via a cacheable mapping and dirties it
>>> in its L1 as part of applying the relocations.
>> ARM-CPU core is sending ReadNotSharedDirty CHI command to LLC (last-level-cache).
>> This cache-line is marked as checked-out in LLC, would be used to keep track
>> of coherency.
> Oh man, that CHI stuff always sends me into a spin so I'm probably not the
> right person to debug it, but let's see...
>
>>> 2. The boot CPU then enters EL2 with the MMU off and fetches the same
>>> line on the I-side. AFAICT, the architecture says this is done with
>>> outer-shareable, non-cacheable attributes.
>> ARM-CPU core is sending ReadNoSnoop CHI command when MMU disabled. The
>> marked cache-line from the step 1) become invalid in LLC.
> When you say "invalid" here, do you mean in the same sense as "cache
> invalidation"? Why is that the right thing to do in response to a _read_?
>
> Would the same invalidation happen if the ReadNoSnoop originated from a
> different CPU?
>
There is no issue if other cores are issuing ReadNoSnoop commands.

>> As per the ARM-ARM specification, CMO is recommended whenever memory
>> attributes are changed for a given memory region.
> The mismatched attribute section (B2.8) is quite a lot more nuanced than
> that, and in particular I don't see it allowing for writes to be lost
> in the presence of a mismatched read. Then again, the whole section is
> hard to follow and doesn't really call out instruction fetch other than
> in a note at the end.
>
>> With my understating the CPU core must generate coherent accesses for
>> Shared+Cacheable memory but not clear for OSH+non-cacheable regions
>> in the spec.
> Ok, but we don't need the reloc to be visible to the MMU-off fetch in
> the case you're describing so coherency between the two aliases is not
> required. The problem is that the cacheable data is being lost forever,
> so even cacheable accesses can't retrieve it.
>
>> Are you expecting "OSH+non-cacheable" must generate coherent accesses?
>>
>>> 3. !!! Somehow the instruction fetch results in the _dirty_ line from (1)
>>> being discarded !!!
>> The ARM-CPU is sending WritebackFull CHI command to LLC for the cache line
>> which was marked as invalid from step 2). The write CMD is ignored/dropped.
> Is this because the LLC is inclusive and doesn't know what to do with the
> writeback after invalidating the line earlier on? When else would a
> writeback get silently dropped?
>
> Stepping back a bit, my concern with all of this is that it seems as though
> non-cacheable instruction fetches that alias with dirty lines on the data
> side can lead to data loss. This feels like a significantly broader issue
> than the relocation case you've run into, particularly as instruction fetch
> with the MMU off is still permitted to fetch speculatively from within a
> couple of pages of the PC. Cache maintenance doesn't feel like a viable
> solution in general, as we'd need to ensure that no fetch occurs between
> the write and the CMO, and this could occur on another CPU, off the back
> of an IRQ or even a FIQ (with the fetch being made at a different exception
> level).
>
> Given that you have all the low-level details about the problem, it would
> probably be worth pulling Arm into this so we can figure out what the right
> solution is. I'm happy to be involved with that, if you like.
>
>
Thanks Will for your valuable comments on dropping valid cache lines. I've
forwarded your responses to hardware team, reviewed CHI and ARM-ARM
specification one more time. Our CPU team was agreed to fix in hardware
convert ReadNoSnoop to ReadSnoop command (step-2).

Thanks,
Shanker Donthineni