Re: [PATCH 1/2] mm/debug_vm_pgtable/basic: Add validation for dirtiness after write protect

From: Anshuman Khandual
Date: Mon Nov 30 2020 - 05:59:13 EST



On 11/30/20 3:08 PM, Catalin Marinas wrote:
> On Mon, Nov 30, 2020 at 09:55:00AM +0530, Anshuman Khandual wrote:
>> On 11/27/20 3:14 PM, Catalin Marinas wrote:
>>> On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote:
>>>> Le 27/11/2020 à 06:06, Anshuman Khandual a écrit :
>>>>> This adds validation tests for dirtiness after write protect conversion for
>>>>> each page table level. This is important for platforms such as arm64 that
>>>>> removes the hardware dirty bit while making it an write protected one. This
>>>>> also fixes pxx_wrprotect() related typos in the documentation file.
>>>>
>>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>>>> index c05d9dcf7891..a5be11210597 100644
>>>>> --- a/mm/debug_vm_pgtable.c
>>>>> +++ b/mm/debug_vm_pgtable.c
>>>>> @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
>>>>> WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte))));
>>>>> WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte))));
>>>>> WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte))));
>>>>> + WARN_ON(pte_dirty(pte_wrprotect(pte)));
>>>>
>>>> Wondering what you are testing here exactly.
>>>>
>>>> Do you expect that if PTE has the dirty bit, it gets cleared by
>>>> pte_wrprotect() ?
>>>>
>>>> Powerpc doesn't do that, it only clears the RW bit but the dirty
>>>> bit remains if it is set, until you call pte_mkclean() explicitely.
>>>
>>> Arm64 has an unusual way of setting a hardware dirty "bit", it actually
>>> clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit
>>> back and we can lose the dirty information. Will found this and posted
>>> patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if
>>> !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern
>>> was that we may inadvertently make a fresh/clean pte dirty with such
>>> change, hence the suggestion for the test.
>>>
>>> That said, I think we also need a test in the other direction,
>>> pte_wrprotect() should preserve any dirty information:
>>>
>>> WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte))));
>>
>> This seems like a generic enough principle which all platforms should
>> adhere to. But the proposed test WARN_ON(pte_dirty(pte_wrprotect(pte)))
>> might fail on some platforms if the page table entry came in as a dirty
>> one and pte_wrprotect() is not expected to alter the dirty state.
>
> Ah, so do we have architectures where entries in protection_map[] are
> already dirty? If those are valid, maybe the check should be:

Okay, I did not imply that actually. The current position for these new
tests in respective pxx_basic_tests() functions is right at the end and
hence the pxx might have already gone through some changes from the time
it was originally created with pfn_pxx(). The entry here is not starting
from the beginning. It is not expected as well, per design. So dirty bit
might or might not be there depending on all the previous test sequences
leading upto these new ones.

IIUC, Christophe mentioned the fact that on platforms like powerpc, dirty
bit just remains unchanged during pte_wprotect(). So the current test
WARN_ON(pte_dirty(pte_wrprotect(pte))) will not work on powerpc if the
previous tests leading upto that point has got the dirty bit set. This is
irrespective of how it was created with pfn_pte() from protection_map[]
originally at the beginning.

>
> WARN_ON(!pte_dirty(pte) && pte_dirty(pte_wrprotect(pte)));
>
>> Instead, should we just add the following two tests, which would ensure
>> that pte_wrprotect() never alters the dirty state of a page table entry.
>>
>> WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte))));
>> WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte))));
>
> These should be added as additional tests. However, my initial thought

Okay, will add them.

> was to check whether pte_wrprotect() on a new pte created from a
> protection_map[] entry directly would inadvertently dirty it. On arm64,
> that means a protection_map[] entry missing PTE_RDONLY. A pte_mkclean()
> would set PTE_RDONLY, so we'd miss such check.
>

To achieve this, we could move the test right at the beginning just after
the pxx gets created from protection_map[], with a comment explaining the
rationale.