Re: [PATCH v4] /dev/mem: Revoke mappings when a driver claims the region

From: Dan Williams
Date: Fri Apr 08 2022 - 02:52:16 EST


On Thu, Apr 7, 2022 at 8:35 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> On Thu, Apr 07, 2022 at 04:43:10PM -0700, Dan Williams wrote:
> > On Thu, Apr 7, 2022 at 11:47 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 6, 2022 at 12:46 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > > >
> > > > *thread necromancy*
> > >
> > > It's alive!
> > >
> > > >
> > > > Hi Dan,
> > > >
> > > > I'm doing a KSPP bug scrub and am reviewing
> > > > https://github.com/KSPP/linux/issues/74 again.
> > > >
> > > > Do you have a chance to look at this? I'd love a way to make mmap()
> > > > behave the same way as read() for the first meg of /dev/mem.
> > >
> > > You want 0-reads or SIGBUS when attempting to access the first 1MB?
> > >
> > > Because it sounds like what you want is instead of loudly failing with
> > > -EPERM in drivers/char/mem.c::mmap_mem() you want it to silently
> > > succeed but swap in the zero page, right? Otherwise if it's SIGBUS
> > > then IO_STRICT_DEVMEM=y + marking that span as IORESOURCE_BUSY will
> > > "Do the Right Thing (TM).".
> >
> > In other words, if IO_STRICT_DEVMEM is enabled then the enforcement is
> > already there at least for anything marked IORESOURCE_BUSY. So if
> > tools are ok with that protection today, maybe there is no need to do
> > the zero page dance. I.e. legacy tools the read(2) /dev/mem below 1MB
> > get zeroes, and apparently no tools were mmap'ing below 1MB otherwise
> > they would have complained by now? At least Fedora is shipping
> > IO_STRICT_DEVMEM these days:
> >
> > https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel-x86_64-fedora.config#_2799
>
> When I try to mmap a RAM area <1MiB, mmap succeeds (range_is_allowed()
> is non-zero), so I don't think IO_STRICT_DEVMEM would trip anything
> using mmap on /dev/mem there.
>
> I am only reading 0s from there, though, but I don't see what's all
> happening. I thought maybe it was just literally unused, but even with
> CONFIG_PAGE_POISONING=y booted with page_poison=1, I still read 0s (not
> 0xaa), but I'd like to understand _why_ (i.e. I can't tell if it is
> accidentally safe, intentionally safe, or my test is bad.)
>
> For example:
>
> # cat /proc/iomem
> 00000000-00000fff : Reserved
> 00001000-0009fbff : System RAM
> 0009fc00-0009ffff : Reserved
> 000a0000-000bffff : PCI Bus 0000:00
> 000c0000-000c99ff : Video ROM
> ...
>
> If I mmap page 0, it's rejected (non-RAM). If I mmap page 1, it works,
> but it's all 0s. (Which is what I'd like, but I don't see where this is
> happening.)

I'm worried it's all zero's by luck and that the logic in
devmem_is_allowed() to return 2 is actually allowing the mmap() case
to successfully bypass STRICT_DEVMEM where read(2) would have had the
buffer cleared by the kernel.

mmap_mem() would need to walk the range and map the zero_page pfn for
all of the intersections with system-ram, but if the mapping is
writable it would need to allocate memory to prevent the zero page
from being written. If you can write to it and still see your data on
the next attempt then STRICT_DEVMEM is being bypassed.