Re: [PATCH v3 2/5] mm/mseal: update madvise() logic

From: Lorenzo Stoakes
Date: Fri Jul 25 2025 - 01:50:31 EST


On Thu, Jul 24, 2025 at 02:15:26PM -0700, Kees Cook wrote:
> On Wed, Jul 16, 2025 at 06:38:03PM +0100, Lorenzo Stoakes wrote:
> > We make a change to the logic here to correct a mistake - we must disallow
> > discard of read-only MAP_PRIVATE file-backed mappings, which previously we
> > were not.
> > The justification for this change is to account for the case where:
> >
> > 1. A MAP_PRIVATE R/W file-backed mapping is established.
> > 2. The mapping is written to, which backs it with anonymous memory.
> > 3. The mapping is mprotect()'d read-only.
> > 4. The mapping is mseal()'d.
> >
> > If we were to now allow discard of this data, it would mean mseal() would
> > not prevent the unrecoverable discarding of data and it was thus violate
> > the semantics of sealed VMAs.
>
> I want to make sure I'm understanding this right:
>
> Was the old behavior to allow discard? (If so, that seems like it wasn't
> doing what Linus asked for[1], but it's not clear to me if that was
> the behavior Chrome wanted.) The test doesn't appear to validate which
> contents end up being visible after the discard, only whether or not
> madvise() succeeds.

Yes the old behaviour allowed discard in this case, because:

/* check anonymous mapping. */
if (vma->vm_file || vma->vm_flags & VM_SHARED)
return false;

In is_ro_anon() would return false (we have vma->vm_file), and in
can_modify_vma_madv() we'd hit:

if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma)))
return false;

/* Allow by default. */
return true;

The fix is to check vma->vm_files & VM_SHARED only in effect.

>
> As an aside, why should discard work in this case even without step 4?
> Wouldn't setting "read-only" imply you don't want the memory to change
> out from under you? I guess I'm not clear on the semantics: how do memory
> protection bits map to madvise actions like this?

I mean this is uAPI so it's moot, we can't change this.

I think you're thinking read-only is stronger than you think it is in the
general case.

VM_MAYWRITE is the key thing here.

In do_mmap() in mm/mmap.c:

if (file) {
struct inode *inode = file_inode(file);
unsigned long flags_mask;
int err;

...

switch (flags & MAP_TYPE) {
case MAP_SHARED:
...
fallthrough;
case MAP_SHARED_VALIDATE:
...
if (!(file->f_mode & FMODE_WRITE))
vm_flags &= ~(VM_MAYWRITE | VM_SHARED);

...
}
...
}

So we're only actually prevented VM_MAYWRITE if the _file_ itself doesn't have
write permission.

Otherwise we might at any time mprotect() the mapping to be writable in any
csae.

mseal() changes things, as it's a stronger requirement. You're explicitly saying
'I don't want this data to be discarded', which is why we should be firmer here.

I disagree this needs to be changed more broadly, but in any case, it'd break
uAPI so it's moot.

And wrt this series, it's further moot.

>
> -Kees
>
> [1] https://lore.kernel.org/lkml/CAHk-=wiVhHmnXviy1xqStLRozC4ziSugTk=1JOc8ORWd2_0h7g@xxxxxxxxxxxxxx/
>
> --
> Kees Cook